[SOLVED] CS代考 OSDI 2004)

30 $

File Name: CS代考_OSDI_2004).zip
File Size: 178.98 KB

SKU: 6543354588 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Cloud storage (II) — Google (cont.), Microsoft Azure

Conventional file systems do not fit the demand of data centers
Recap: GFS: Why?

Copyright By PowCoder代写加微信 assignmentchef

Workloads in data centers are different from conventional
• • • • • •
Storage based on inexpensive disks that fail frequently Many large files in contrast to small files for personal data Primarily reading streams of data
Sequential writes appending to the end of existing files Must support multiple concurrent operations
Bandwidth is more critical than latency 2

Recap: Data-center workloads for GFS
MapReduce (MapReduce: Simplified Data Processing on Large Clusters, OSDI 2004)
Large-scale machine learning problems
Extraction of user data for popular queries
extraction of properties of web pages for new experiments and products large-scale graph computations
BigTable (Bigtable: A Distributed Storage System for Structured Data, OSDI 2006)
Google analytics Google earth Personalized search
Google Search (Web Search for a Planet: The Google Cluster Architecture, IEEE
Micro, vol. 23, 2003)

Recap: What GFS proposes?
Maintaining the same interface
• Thesamefunctioncalls
• Thesamehierarchicaldirectory/files
Flat structure — Hierarchical namespace implemented with flat Architecture — Master/chunkservers/clients
Large chunks — Files are decomposed into large chunks (e.g.
64MB) with replicas

Recap: Large Chunks
How many of the following datacenter characteristics can large chunks help
” Storagebasedoninexpensivedisksthatfailfrequently # Manylargefilesincontrasttosmallfilesforpersonaldata $ Primarilyreadingstreamsofdata
% Sequentialwritesappendingtotheendofexistingfiles & Mustsupportmultipleconcurrentoperations
‘ Bandwidthismorecriticalthanlatency

Google File System (cont.)
Windows: A Highly Available Cloud Storage f4: Facebook’s Warm BLOB Storage System
Service with Strong Consistency

Directories are illusions
Namespace maintained like a hash table
Flat file system structure

How many of the following statements can flat file system structure help address? ” Storagebasedoninexpensivedisksthatfailfrequently
# Manylargefilesincontrasttosmallfilesforpersonaldata
$ Primarilyreadingstreamsofdata
% Sequentialwritesappendingtotheendofexistingfiles & Mustsupportmultipleconcurrentoperations
‘ Bandwidthismorecriticalthanlatency
B. 2 C. 3 D. 4 E. 5
Flat file system structure
https://www.pollev.com/hungweitseng close in

How many of the following statements can flat file system structure help address? ” Storagebasedoninexpensivedisksthatfailfrequently
# Manylargefilesincontrasttosmallfilesforpersonaldata
$ Primarilyreadingstreamsofdata
% Sequentialwritesappendingtotheendofexistingfiles & Mustsupportmultipleconcurrentoperations
Flat file system structure
fine-grained locking to reduce the chance of conflicts — you don’t have to lock the whole path when acces
‘ Bandwidthismorecriticalthanlatency
B. 2 C. 3 D. 4 E. 5
Actually improve both.

client How open works with NFS server open(“/mnt/nfs/home/hungwei/foo.c”, O_RDONLY);
lookup folor o/hkoump feo/rhhuonmgweei/foo.c
return the list of locations of /home/hungwei/foo.c
return the inode of home read for home
read from one data location of /home/hungwei/foo.c return daretatuorfn/hthoemdea/thauonfghwoemi/efoo.c
lookup for hungwei return the inode of hungwei read for hungwei return the data of hungwei lookup for foo.c return the inode of foo.c read for foo.c
return the data of foo.c
You only need these in GFS

GFS architecture
Regarding the GFS architecture, how many of the following statements are
” TheGFSclusterinthepaperonlyhasoneactiveservertostoreandmanipulate metadata
# ThechunkserverinGFSmaycontaindatathatcanalsobefoundonanother chunkserver
$ Thechunkserverisdedicatedfordatastorageandmaynotbeusedforotherpurpose % Theclientcancachefiledatatoimproveperformance
https://www.pollev.com/hungweitseng close in

GFS architecture
# ThechunkserverinGFSmaycontaindatathatcanalsobefoundonanother
Regarding the GFS architecture, how many of the following statements are ” TheGFSclusterinthepaperonlyhasoneactiveservertostoreandmanipulate
— single failure point. They have shadow masters chunkserver — 3 replicas by default
$ Thechunkserverisdedicatedfordatastorageandmaynotbeusedforotherpurpose
% Theclientcancachefiledatatoimproveperformance
— improve the machine utilization — saving money!
— simplify the design

GFS Architecture decoupled data and control paths —
only control path goes through master
Application
filename, size data GFS Client
file namespace
/foo/bar, 2ef0
chunk location
chunk handle, offset
filename, chunk index
chunk handle, chunk locations
instructions to chunk servers
status from chunk servers
Chunk Server
Disk Disk Disk Disk Disk Disk
Chunk Server
Disk Disk Disk Disk Disk Disk
Chunk Server
Disk Disk Disk Disk Disk Disk
Chunk Server
Disk Disk Disk Disk Disk Disk
load balancing, replicas among chunkservers

• chunksarereplicatedtoimprovereliability(3replicas) Client
• APIstointeractwithapplications
• interactswithmastersforcontroloperations
• interactswithchunkserversforaccessingdata • Canrunonchunkservers
Single master Distributed architecture
• maintainsfilesystemmetadataincludingnamespace,mapping,accesscontrol
and chunk locations.
controls system wide activities including garbage collection and chunk migration.
Chunkserver
• storesdatachunks

Reading data in GFS
Application
filename, size data
filename, chunk index
chunk handle, chunk locations
chunk handle, byte range
data from file
GFS Client
Chunk server
Chunk server
Chunk server

Writing data in GFS
Application
filename, data response filename, chunk index
GFS Client
chunk handle, primary and secondary replicas
Chunk server Chunk server
data write command
primary defines the order of updates in chunk servers
Chunk server

Distributed, simple, efficient Filename/metadata updates/creates are atomic Consistency modes
GFS: Relaxed Consistency model
Write — write to a specific offset
Append — write to the end of a file
Serial success
Defined with interspersed with inconsistent
Concurrent success
Consistent but undefined
inconsistent
Consistent: all replicas have the same value Defined: replica reflects the mutation, consistent
Applications need to deal with inconsistent cases themselves 23

Real world, industry experience
Linux problems (section 7)
• Linuxdriverissues—disksdonotreporttheircapabilitieshonestly
• Thecostoffsync—proportiontofilesizeratherthanupdated
chunk size
Single reader-writer lock for mmap
contribute to the rest of the community
GFS is not open-sourced
Due to the open-source nature of Linux, they can fix it and

Single master design
GFS claims this will not be a bottleneck In-memory data structure for fast access
Only involved in metadata operations — decoupled data/ control paths
Client cache
What if the master server fails?

Mentioned in “Spanner: Google’s Globally-Distributed Database”, OSDI 2012 — “tablet’s state is stored in set of B- tree-like files and a write-ahead log, all on a distributed file system called Colossus (the successor to the Google File System)”
Single master
The evolution of GFS

Support for smaller chunk size — gmail
The evolution of GFS

• • • • • • • •
Lots of other interesting topics
namespace locking
replica placement
create, re-replication, re-balancing garbage collection
stable replica detection
data integrity
diagnostic tools: logs are your friends

large chunk size
Primarily reading streams of data — large chunk size
Must support multiple concurrent operations — flat structure Bandwidth is more critical than latency — large chunk size
Do they achieve their goals?
Storage based on inexpensive disks that fail frequently —
replication, distributed storage
Many large files in contrast to small files for personal data —
Sequential writes appending to the end of existing files — large
chunk size

GFS only supports consistency models Scalability — single master
Only efficient in dealing with large data No geo-redundancy
What’s missing in GFS?

Windows: A Highly Available Cloud Storage
Service with Strong Consistency
,,,,,Kelvie,,, Jiesheng Wu,,,,,,,,,, ul Haq, ul Haq,,,,Nett,,, Microsoft

Data center workloads for WAS

Table: database tables
Queue: store and retrieve messages. Queue messages can be up to 64 KB in size, and a queue can contain millions of messages. Queues are generally used to store lists of messages to be processed asynchronously.
Why Windows
A cloud service platform for social network search, video streaming,
XBOX gaming, records management, and etc. in M$.
• Musttoleratemanydifferentdataabstractions:blobs,tablesandqueues • Datatypes:
• Blob(Binary Large OBjects) storage: pictures, excel files, HTML files, virtual hard disks (VHDs), big data such as logs, database backups — pretty much

Why Windows (cont.)
Learning from feedbacks in existing cloud storage • Strongconsistency
• Globalandscalablenamespace/storage
• Disasterrecovery
• Multi-tenancyandcostofstorage

All problems in computer science can be solved by another level of indirection

of storage
What WAS proposes?
Application Client
DNS (Domain Name Service)
Stamp is the basic granularity provisioning, fault domain, geo-replication.
Virtual IP
Virtual IP
Front-ends Partition layer
Storage stamp
Stream layer Intra-stamp replication
inter-stamp Front-ends inter-stamp Front-ends
A stamp can contain 10—20 racks with 18
replication Partition layer replication Partition layer
disk-heavy storage node per rack.
You may consider each stamp is similar to a
Intra-stamp replication Intra-stamp replication
Location Service
Stream layer Stream layer
Storage stamp Storage stamp
Virtual IP

What WAS proposes?
Application
URI: http(s)://AccountName. .core.windows.net/PartitionName/ObjectName http(s)://AccountName. .core.windows.net/
• DNS (Domain Name Manages account namespace across
PartitionName/ObjectName
data Distributed across multiple geographic lo
inter-stamp Front-ends replication
Storage stamp
virtual IP of a stamp Service) all storage stamps
Manages all storage stamps
AccountName
Location Service
Virtual IP
Virtual IP
Virtual IP
Front-ends Partition layer
Storage stamp
inter-stamp replication
Front-ends Partition layer
Storage stamp
Stream layer Intra-stamp replication
Partition layer
Stream layer Intra-stamp replication
Stream layer Intra-stamp replication

GFS v.s. stamp in WAS
e, chunk index
andle, primary ondary replicas
Chunk server
replication
Chunk server
replication
Chunk server
Front-ends Partition layer
Stream layer
Stream Manager
Stream Manager Stream Manager
Extent node
Extent node
Extent node
Extent node
unk handle, byte range
Extent node
Extent node
Extent node
Extent node

statements is/are true
What is a stream?
Regarding a stream in WAS, please identify how many of the following
” Astreamisalistofextents,inwhichanextentconsistsofconsecutiveblocks # Eachblockinthestreamcontainsachecksumtoensurethedataintegrity
$ Anupdatetoastreamcanonlybeappendedtotheendofthestream
% Twostreamscansharethesamesetofextents
A. 0 B. 1 C. 2 D. 3 E. 4
https://www.pollev.com/hungweitseng close in

What is a stream?
Regarding a stream in WAS, please identify how many of the following
” Astreamisalistofextents,inwhichanextentconsistsofconsecutiveblocks
statements is/are true
Similar to an extent-base file system. Shares the same benefits with EXT-based systems
# Eachblockinthestreamcontainsachecksumtoensurethedataintegrity
As a result, we need to read a whole block every time…. But not a big issue because …
$ Anupdatetoastreamcanonlybeappendedtotheendofthestream
Append only, copy-on-write … (Doesn’t this sound familiar?) Improved bandwidth, data locality % Twostreamscansharethesamesetofextents LogFS
De-duplication to save disk space
A. 0 B. 1 C. 2 D. 3
Minimize the time when creating a new file

GFS v.s. stamp in WAS
e, chunk index
andle, primary ondary replicas
Chunk server
replication
Chunk server
replication
Chunk server
Front-ends
inter-stamp replication
Front-ends
Partition layer
Stream layer create extent
Stream Manager Stream Manager
Partition layer
Stream layer
Stream Manager
Stream Manager Stream Manager
Stream Manager
llocate extent replica set
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
unk handle, byte range
secondary secondary
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
replication replication

Project due this Thursday
• Pleasesubmityourcurrentprogress
“Very last” reading quiz due next Tuesday
iEVAL — count as an extra, full-credit reading quiz
Announcement
• Youwillhaveanotherweekof“revisionperiod”
• Allows you to revise your project with 30% of penalty on the unsatisfactory parts/test cases
after the first-round of grading (firm deadline 3/11)
Say you got only 60% in the first-round, and you fixed everything before 3/11 — you can
still get 60%+70%*40% = 88%
Final — contains two parts (each account for 50%)
• Part1:80minutemultiplechoices/answersquestions+twoproblemsetsof
comprehensive exam questions
Part 2: unlimited time between 3/11-3/17, open-ended questions

Engineering

程序代写 CS代考加微信: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS代考 OSDI 2004)
30 $