[SOLVED] CS algorithm distributed system cache file system data structure Consistent Hashing

$25

File Name: CS_algorithm_distributed_system_cache_file_system_data_structure_Consistent_Hashing.zip
File Size: 781.86 KB

5/5 - (1 vote)

Consistent Hashing
OrganizationBased Analysis of WebObject Sharing and Caching Alec Wolma et al.

Five cache nodes
Client determines node
Cache hit: Serve from cache
to contact
Client
Server
Caching
Distributed Systems (HansArno Jacobsen) 2
Cache miss: Serve from server &
populate cache

Problem: Mapping objects to caches
Given a number of caches (e.g., cooperative caching, CDNs, etc.)
Each cache should carry an equal share of objects
Clients need to know what cache to query for a given
object
Horizontally partition (shard) object ID space
Doesnt work with skewed distributions: e.g., 10 servers, each handles 100 IDs, but all objects have IDs between 1100 or 900 1000
Caches should be able to come and go without disrupting the whole operation (i.e., noneffected caches)
Distributed Systems (HansArno Jacobsen) 3

Solution attempt: Use hashing
Map object ID (e.g., URL u) into one of the caches
Use a hash function that maps u to node h(u)
Forexample,h(x)=(ax+b)modp,wherepisrangeofh(x),i.e.,the number of caches
InterpretuasanumberbasedonbitpatternofobjectID(orURL)
Hashing tends to distribute input uniformly across range of hash function
Objects(URLs)areequallybalancedacrosscaches,evenifobjectIDs are skewed (i.e., highly clustered in ID space)
No one cache responsible for an uneven share of objects/URLs
No disproportionately loaded node (potential bottleneck)
Distributed Systems (HansArno Jacobsen) 4

h(u) = (7u + 4) mod 5 Assume, we have five
caches, numbered 0, , 4.
C0
Objects
Client
h(4) C h(2)
Server
h(6) C1 6
h(5)
24 C3 2
Distributed Systems (HansArno Jacobsen)
5
C4
5

Removing a cache changes location
Old New
Client
4
1 2
Server
h(u) = (7u + 4) mod 4 (now have to map across 4 caches)
of almost every objects! C0
Now: h(6)=2 C1 Now: h(4)=0
Now: h(2)=2
Now: h(5)=3
6
Distributed Systems (HansArno Jacobsen)
6
C2
2
6, 2
C3
55
mapping
mapping 4
3
4

Client
1
h(u) = 7u + 4 mod 4 (mapped across 4 nodes)
h(4)=0 0 4
h(2)=2 h(6)=2
2
6, 2 5
h(5)=3
3
Objects
Distributed Systems (HansArno Jacobsen) 7

Client
1
6 6, 2 4 52 5
Adding a cache changes the location
4
of almost every object!
h(u) = (7u + 4) mod 5 (adding a cache again)
Old New h(4)=2 0 4
h(2)=3 h(6)=1
2
h(5)=4
3
Distributed Systems (HansArno Jacobsen)
8

Goals
Consistent hashing
Uniform distribution of objects across nodes
Easily find objects
Let any client perform a local computation mapping a URL to node that contains referenced object
Allow for nodes to be added/removed without much disruption remap only n/m objects (n objects, m slots)
D. Karger et al., MIT, 1997
Basis for Akamai
CDN company (content distribution network) Web cache as a service
Distributed Systems (HansArno Jacobsen) 9

slot via h(..)
Each cache is mapped to a
m2
m1 0 1
2
slot via h(..)
Assign each object to the closest
m3
3
Consistent hashing Key idea intuition
Select a base hash function that maps input identifier to the number range [0, , m1]
E.g.,h(x)=(ax+b)modm
Interpret range of h(..) as array that wraps around (i.e., a circle)
h(..) gives slot in array (circle) and wraps around at m1 to 0
Each object is mapped to a
cache slot in clockwise direction on the circle
Clockwise
Distributed Systems (HansArno Jacobsen)
10

point on unit circle via h(..) Each cache is mapped to a
6
point on unit circle via h(..) Assign each URL to closest
Cache Object
1
Consistent hashing Original interpretation
Select a base hash function that maps input identifier to the number range [0, , M]
Divide by M, remapping [0,,M] to [0, 1]
Interpret this interval as the unit circle: Here, circle with
circumference 1 (normally radius 1) Each object is mapped to a
C
7
A
cache point in clockwise direction on the circle
4
B
2 3
Distributed Systems (HansArno Jacobsen)
11
5

6 5
A
4
2 3
Cache Object
Mapping items to caches
C
7
1
Items Items Items
2, 3
4, 5, 6 7, 1
mapped to B mapped to C mapped to A
B
Distributed Systems (HansArno Jacobsen)
12

6 5
Items
4, 5, 6 7, 1
mapped to C mapped to A
4
2 3
C
7
1
Items Items
2, 3, 7, 1 mapped to B
B
Removing a cache
A
Distributed Systems (HansArno Jacobsen)
13

6 5
7, 1, 2
C
4, 5, 6
mapped to C mapped to A
4
B3
A
Adding a cache
7
1
Items Items Items
37, 1, 2, 3 mapped to B
2
Distributed Systems (HansArno Jacobsen)
14

Node
Lookup(key)
C
Retrieve object with key from A
Processing a Lookup(key)
A B
Information about
node addition & removal (e.g., via gossiping or via a coordination service)
Distributed Systems (HansArno Jacobsen) 15

Cache lookup data structure at each node
Store cache points in a binary tree
Find clockwise successor of a URL point by single search in
tree (takes O(log n) time)
For a constant time technique, cf. Karger et al., 1997
key[left(x)] key[x] key[right(x)] F BH
ADK
Distributed Systems (HansArno Jacobsen) 16

Client wants to read key k1 (from table t1)
4
Cassandra global readpath
Cassandra
Coordinator 2
Cassandra Disks Zone A
Client sends request to any node
Disks
Zone C
Coordinator determines
responsible replica & forwards request
Key range for k1
Cassandra Disks Zone C
Cassandra Disks Zone B
3
Replica queries local file system, Returns value
Cassandra Disks Zone A
1 Client Request
Cassandra Disks Zone B
Replica
Distributed Systems (HansArno Jacobsen)
18

Base hash function: MD5
Message Digest 5 (MD5), R. Rivest, 1992 (MD1, , MD6)
Hash function that produces a 128bit (16byte) hash value
Maps variablelength message into a fixedlength output
MD5 hash is typically expressed as a hex number (32 digits)
Its been shown that MD5 is not collision resistant
USCERT about MD5 should be considered cryptographically broken and unsuitable for further use (for security, not for caching)
SHA2 is a more appropriate cryptographic hash function
For consistent hashing, MD5 is sufficient
Distributed Systems (HansArno Jacobsen) 19

MD5 examples
MD5(Thequickbrownfoxjumpsoverthelazy dog) = 9e107d9d372bb6826bd81d3542a419d6
MD5(Thequickbrownfoxjumpsoverthelazy dog.) = e4d909c290d0fb1ca068ffaddf22cbd0
MD5()=d41d8cd98f00b204e9800998ecf8427e http://en.wikipedia.org/wiki/MD5#Algorithm
Distributed Systems (HansArno Jacobsen) 20

Selfstudy questions
HowwouldyouuseMD5andSHA2insteadofh(..) from our slides for consistent hashing?
Applyh(..),MD5,SHA2toaURL,wheredoesthe output map on the unit circle?
Discussprosandconsofhavingagivencaching server map to one vs. more points on the unit circle.
Whataretheimplicationsofaslashdoteffecton consistent hashing?
Distributed Systems (HansArno Jacobsen) 21

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS algorithm distributed system cache file system data structure Consistent Hashing
$25