Cloud storage (II) Google (cont.), Microsoft Azure
Conventional file systems do not fit the demand of data centers
Recap: GFS: Why?
Copyright By Assignmentchef assignmentchef
Workloads in data centers are different from conventional
Storage based on inexpensive disks that fail frequently Many large files in contrast to small files for personal data Primarily reading streams of data
Sequential writes appending to the end of existing files Must support multiple concurrent operations
Bandwidth is more critical than latency 2
Recap: Data-center workloads for GFS
MapReduce (MapReduce: Simplified Data Processing on Large Clusters, OSDI 2004)
Large-scale machine learning problems
Extraction of user data for popular queries
extraction of properties of web pages for new experiments and products large-scale graph computations
BigTable (Bigtable: A Distributed Storage System for Structured Data, OSDI 2006)
Google analytics Google earth Personalized search
Google Search (Web Search for a Planet: The Google Cluster Architecture, IEEE
Micro, vol. 23, 2003)
Recap: What GFS proposes?
Maintaining the same interface
Thesamefunctioncalls
Thesamehierarchicaldirectory/files
Flat structure Hierarchical namespace implemented with flat Architecture Master/chunkservers/clients
Large chunks Files are decomposed into large chunks (e.g.
64MB) with replicas
Recap: Large Chunks
How many of the following datacenter characteristics can large chunks help
Storagebasedoninexpensivedisksthatfailfrequently # Manylargefilesincontrasttosmallfilesforpersonaldata $ Primarilyreadingstreamsofdata
% Sequentialwritesappendingtotheendofexistingfiles & Mustsupportmultipleconcurrentoperations
Bandwidthismorecriticalthanlatency
Google File System (cont.)
Windows: A Highly Available Cloud Storage f4: Facebooks Warm BLOB Storage System
Service with Strong Consistency
Directories are illusions
Namespace maintained like a hash table
Flat file system structure
How many of the following statements can flat file system structure help address? Storagebasedoninexpensivedisksthatfailfrequently
# Manylargefilesincontrasttosmallfilesforpersonaldata
$ Primarilyreadingstreamsofdata
% Sequentialwritesappendingtotheendofexistingfiles & Mustsupportmultipleconcurrentoperations
Bandwidthismorecriticalthanlatency
B. 2 C. 3 D. 4 E. 5
Flat file system structure
https://www.pollev.com/hungweitseng close in
How many of the following statements can flat file system structure help address? Storagebasedoninexpensivedisksthatfailfrequently
# Manylargefilesincontrasttosmallfilesforpersonaldata
$ Primarilyreadingstreamsofdata
% Sequentialwritesappendingtotheendofexistingfiles & Mustsupportmultipleconcurrentoperations
Flat file system structure
fine-grained locking to reduce the chance of conflicts you dont have to lock the whole path when acces
Bandwidthismorecriticalthanlatency
B. 2 C. 3 D. 4 E. 5
Actually improve both.
client How open works with NFS server open(/mnt/nfs/home/hungwei/foo.c, O_RDONLY);
lookup folor o/hkoump feo/rhhuonmgweei/foo.c
return the list of locations of /home/hungwei/foo.c
return the inode of home read for home
read from one data location of /home/hungwei/foo.c return daretatuorfn/hthoemdea/thauonfghwoemi/efoo.c
lookup for hungwei return the inode of hungwei read for hungwei return the data of hungwei lookup for foo.c return the inode of foo.c read for foo.c
return the data of foo.c
You only need these in GFS
GFS architecture
Regarding the GFS architecture, how many of the following statements are
TheGFSclusterinthepaperonlyhasoneactiveservertostoreandmanipulate metadata
# ThechunkserverinGFSmaycontaindatathatcanalsobefoundonanother chunkserver
$ Thechunkserverisdedicatedfordatastorageandmaynotbeusedforotherpurpose % Theclientcancachefiledatatoimproveperformance
https://www.pollev.com/hungweitseng close in
GFS architecture
# ThechunkserverinGFSmaycontaindatathatcanalsobefoundonanother
Regarding the GFS architecture, how many of the following statements are TheGFSclusterinthepaperonlyhasoneactiveservertostoreandmanipulate
single failure point. They have shadow masters chunkserver 3 replicas by default
$ Thechunkserverisdedicatedfordatastorageandmaynotbeusedforotherpurpose
% Theclientcancachefiledatatoimproveperformance
improve the machine utilization saving money!
simplify the design
GFS Architecture decoupled data and control paths
only control path goes through master
Application
filename, size data GFS Client
file namespace
/foo/bar, 2ef0
chunk location
chunk handle, offset
filename, chunk index
chunk handle, chunk locations
instructions to chunk servers
status from chunk servers
Chunk Server
Disk Disk Disk Disk Disk Disk
Chunk Server
Disk Disk Disk Disk Disk Disk
Chunk Server
Disk Disk Disk Disk Disk Disk
Chunk Server
Disk Disk Disk Disk Disk Disk
load balancing, replicas among chunkservers
chunksarereplicatedtoimprovereliability(3replicas) Client
APIstointeractwithapplications
interactswithmastersforcontroloperations
interactswithchunkserversforaccessingdata Canrunonchunkservers
Single master Distributed architecture
maintainsfilesystemmetadataincludingnamespace,mapping,accesscontrol
and chunk locations.
controls system wide activities including garbage collection and chunk migration.
Chunkserver
storesdatachunks
Reading data in GFS
Application
filename, size data
filename, chunk index
chunk handle, chunk locations
chunk handle, byte range
data from file
GFS Client
Chunk server
Chunk server
Chunk server
Writing data in GFS
Application
filename, data response filename, chunk index
GFS Client
chunk handle, primary and secondary replicas
Chunk server Chunk server
data write command
primary defines the order of updates in chunk servers
Chunk server
Distributed, simple, efficient Filename/metadata updates/creates are atomic Consistency modes
GFS: Relaxed Consistency model
Write write to a specific offset
Append write to the end of a file
Serial success
Defined with interspersed with inconsistent
Concurrent success
Consistent but undefined
inconsistent
Consistent: all replicas have the same value Defined: replica reflects the mutation, consistent
Applications need to deal with inconsistent cases themselves 23
Real world, industry experience
Linux problems (section 7)
Linuxdriverissuesdisksdonotreporttheircapabilitieshonestly
Thecostoffsyncproportiontofilesizeratherthanupdated
chunk size
Single reader-writer lock for mmap
contribute to the rest of the community
GFS is not open-sourced
Due to the open-source nature of Linux, they can fix it and
Single master design
GFS claims this will not be a bottleneck In-memory data structure for fast access
Only involved in metadata operations decoupled data/ control paths
Client cache
What if the master server fails?
Mentioned in Spanner: Googles Globally-Distributed Database, OSDI 2012 tablets state is stored in set of B- tree-like files and a write-ahead log, all on a distributed file system called Colossus (the successor to the Google File System)
Single master
The evolution of GFS
Support for smaller chunk size gmail
The evolution of GFS
Lots of other interesting topics
namespace locking
replica placement
create, re-replication, re-balancing garbage collection
stable replica detection
data integrity
diagnostic tools: logs are your friends
large chunk size
Primarily reading streams of data large chunk size
Must support multiple concurrent operations flat structure Bandwidth is more critical than latency large chunk size
Do they achieve their goals?
Storage based on inexpensive disks that fail frequently
replication, distributed storage
Many large files in contrast to small files for personal data
Sequential writes appending to the end of existing files large
chunk size
GFS only supports consistency models Scalability single master
Only efficient in dealing with large data No geo-redundancy
Whats missing in GFS?
Windows: A Highly Available Cloud Storage
Service with Strong Consistency
,,,,,Kelvie,,, Jiesheng Wu,,,,,,,,,, ul Haq, ul Haq,,,,Nett,,, Microsoft
Data center workloads for WAS
Table: database tables
Queue: store and retrieve messages. Queue messages can be up to 64 KB in size, and a queue can contain millions of messages. Queues are generally used to store lists of messages to be processed asynchronously.
Why Windows
A cloud service platform for social network search, video streaming,
XBOX gaming, records management, and etc. in M$.
Musttoleratemanydifferentdataabstractions:blobs,tablesandqueues Datatypes:
Blob(Binary Large OBjects) storage: pictures, excel files, HTML files, virtual hard disks (VHDs), big data such as logs, database backups pretty much
Why Windows (cont.)
Learning from feedbacks in existing cloud storage Strongconsistency
Globalandscalablenamespace/storage
Disasterrecovery
Multi-tenancyandcostofstorage
All problems in computer science can be solved by another level of indirection
of storage
What WAS proposes?
Application Client
DNS (Domain Name Service)
Stamp is the basic granularity provisioning, fault domain, geo-replication.
Virtual IP
Virtual IP
Front-ends Partition layer
Storage stamp
Stream layer Intra-stamp replication
inter-stamp Front-ends inter-stamp Front-ends
A stamp can contain 1020 racks with 18
replication Partition layer replication Partition layer
disk-heavy storage node per rack.
You may consider each stamp is similar to a
Intra-stamp replication Intra-stamp replication
Location Service
Stream layer Stream layer
Storage stamp Storage stamp
Virtual IP
What WAS proposes?
Application
URI: http(s)://AccountName.
DNS (Domain Name Manages account namespace across
PartitionName/ObjectName
data Distributed across multiple geographic lo
inter-stamp Front-ends replication
Storage stamp
virtual IP of a stamp Service) all storage stamps
Manages all storage stamps
AccountName
Location Service
Virtual IP
Virtual IP
Virtual IP
Front-ends Partition layer
Storage stamp
inter-stamp replication
Front-ends Partition layer
Storage stamp
Stream layer Intra-stamp replication
Partition layer
Stream layer Intra-stamp replication
Stream layer Intra-stamp replication
GFS v.s. stamp in WAS
e, chunk index
andle, primary ondary replicas
Chunk server
replication
Chunk server
replication
Chunk server
Front-ends Partition layer
Stream layer
Stream Manager
Stream Manager Stream Manager
Extent node
Extent node
Extent node
Extent node
unk handle, byte range
Extent node
Extent node
Extent node
Extent node
statements is/are true
What is a stream?
Regarding a stream in WAS, please identify how many of the following
Astreamisalistofextents,inwhichanextentconsistsofconsecutiveblocks # Eachblockinthestreamcontainsachecksumtoensurethedataintegrity
$ Anupdatetoastreamcanonlybeappendedtotheendofthestream
% Twostreamscansharethesamesetofextents
A. 0 B. 1 C. 2 D. 3 E. 4
https://www.pollev.com/hungweitseng close in
What is a stream?
Regarding a stream in WAS, please identify how many of the following
Astreamisalistofextents,inwhichanextentconsistsofconsecutiveblocks
statements is/are true
Similar to an extent-base file system. Shares the same benefits with EXT-based systems
# Eachblockinthestreamcontainsachecksumtoensurethedataintegrity
As a result, we need to read a whole block every time. But not a big issue because
$ Anupdatetoastreamcanonlybeappendedtotheendofthestream
Append only, copy-on-write (Doesnt this sound familiar?) Improved bandwidth, data locality % Twostreamscansharethesamesetofextents LogFS
De-duplication to save disk space
A. 0 B. 1 C. 2 D. 3
Minimize the time when creating a new file
GFS v.s. stamp in WAS
e, chunk index
andle, primary ondary replicas
Chunk server
replication
Chunk server
replication
Chunk server
Front-ends
inter-stamp replication
Front-ends
Partition layer
Stream layer create extent
Stream Manager Stream Manager
Partition layer
Stream layer
Stream Manager
Stream Manager Stream Manager
Stream Manager
llocate extent replica set
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
unk handle, byte range
secondary secondary
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
Extent node
replication replication
Project due this Thursday
Pleasesubmityourcurrentprogress
Very last reading quiz due next Tuesday
iEVAL count as an extra, full-credit reading quiz
Announcement
Youwillhaveanotherweekofrevisionperiod
Allows you to revise your project with 30% of penalty on the unsatisfactory parts/test cases
after the first-round of grading (firm deadline 3/11)
Say you got only 60% in the first-round, and you fixed everything before 3/11 you can
still get 60%+70%*40% = 88%
Final contains two parts (each account for 50%)
Part1:80minutemultiplechoices/answersquestions+twoproblemsetsof
comprehensive exam questions
Part 2: unlimited time between 3/11-3/17, open-ended questions
Engineering
CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.