[SOLVED] CS distributed system data structure CRDT CONFLICTFREE REPLICATED DATA TYPES

$25

File Name: CS_distributed_system_data_structure_CRDT__CONFLICTFREE_REPLICATED_DATA_TYPES.zip
File Size: 725.34 KB

5/5 - (1 vote)

CRDT CONFLICTFREE REPLICATED DATA TYPES
Distributed Systems (HansArno Jacobsen) 1
Pixabay.com

CRDTs Units
Eventual consistency, informally
Statebased objects
Eventual consistency, more formally Conflictfree replicated data types
Distributed Systems (HansArno Jacobsen) 2

EVENTUAL CONSISTENCY, INFORMALLY
Distributed Systems (HansArno Jacobsen)
3
Pixabay.com

Eventual Consistency
Eventualconsistencyisdesirableforlargescale distributed systems where high availability is important
Tendstobecheaptoimplement(e.g.,viagossip)but may serve stale data
Constitutesachallengeforenvironmentswhere stronger consistency is important
Distributed Systems (HansArno Jacobsen) 4

Handling Concurrent Writes
Premise for eventual consistency were scenarios with few (no) concurrent writes to the same key (cf. clientcentric consistency)
However, we do need a mechanism to handle concurrent writes should they so happen
If there were a way to handle concurrent writes, we could support eventual consistency more broadly
Would only need to guarantee that after processing all writes for a key, all replicas converge, no matter what order the writes are processed (e.g., assuming gossip)
Distributed Systems (HansArno Jacobsen) 5

Max register L1: 0 W(4) (4) W(2) (4)
Examples
Growthonly counter (Gcounter)
L1: 0 W(+5) (5) W(+2) (7) W(+1) 8
L2: 0 W(+2) (2) W(+5) (7) W(+1) 8 Writes propagate to L2, L1, respectively

Different locations (replicas)
merge(5) 5 L2: 0 W(5) (5) W(3) (5) merge(4)
State propagate to L2, L1 via periodic merging
Distributed Systems (HansArno Jacobsen)
6
5

Selfstudy Questions
Think of a few basic data structures, like lists, sets, counters, binary trees, heaps, maps, etc., and visualize for yourself what happens if replicated instances of these structures are updated via gossip.
Does their state converge, no matter the update sequence?
What happens if update operations are lost or duplicated?
What mechanisms we know other than gossip could be used to keep these replicated structures updated without violating their convergence.
What are pros and cons of these mechanisms?
Distributed Systems (HansArno Jacobsen) 7

Distributed Systems (HansArno Jacobsen) 8

CRDT FROM STATEBASED OBJECTS TO REPLICATED STATEBASED OBJECTS
Distributed Systems (HansArno Jacobsen)
9
Pixabay.com

Statebased objects Mostly plain old objects
Offerupdateandqueryrequeststoclients
Maintaininternalstate
Processclientrequests
Perform merge requests amongst each other Periodicallymerge(supportinfrastructure)
Distributed Systems (HansArno Jacobsen) 10

Statebased Object
What we commonly know as object Comprised of
Internal state
One or more query methods One or more update methods A merge method
Distributed Systems (HansArno Jacobsen) 11

class Avg(object): def __init__(self):
def update(self, x): self.sum += x self.cnt += 1
self.sum = 0
self.cnt = 0
def query(self): if self.cnt != 0:
def merge(self, avg): self.sum += avg.sum self.cnt += avg.cnt
return
else: return 0
self.sum /
self.cnt
Class Average Running Example
Distributed Systems (HansArno Jacobsen) 12

Average
Statebased object representing a running average
Internalstate
self.sum and self.cnt
Query returns average
Update updates average with a new value x
Merge merges one Avg instance into another one
Distributed Systems (HansArno Jacobsen) 13

Replicated Statebased Object
Statebased object replicated across multiple nodes
E.g., replicate Avg across two nodes
Both nodes have a copy of statebased object
Clients send query and update to a single node
Nodes periodically send their copy of statebased object to other nodes for merging
Distributed Systems (HansArno Jacobsen) 14

Node a
a0 Timeline Update
Unique
Causal history based on operation identifiers
operation identifier
Each state represents a snapshot of object in time that results from updates applied
state
query
history
State
a1
a0 sum:0, cnt:0
0
a1 sum:1, cnt:1
1 0
a2 sum:4, cnt:2 Distributed Systems (HansArno Jacobsen)
2 0,1
State
15

Operation identifier is unique across replicas
Each state represents a snapshot of object in time that results from updates applied
state
query
history
Timeline
a0 sum:0, cnt:0
0
a1 sum:1, cnt:1
1 0
a2 sum:4, cnt:2
2 0,1
Distributed Systems (HansArno Jacobsen)
16

States and Causal Histories
If y = x.update() where the update has identifier i, then the causal history of y is the causal history of x union { i }.
a0 sum:0, a1 sum:1, b0 sum:0, b1 sum:2, b2 sum:6,
cnt:0 cnt:1 cnt:0 cnt:1 cnt:2
0 1 0 2 3
{} {0} {} {1} {1,2}
Distributed Systems (HansArno Jacobsen)
17
state
query ()
history

a0 sum:0,
cnt:0 cnt:1 cnt:0 cnt:1 cnt:2
0{} 2 {0} 0{} 4 {1}
a1 sum:2,
b0 sum:0,
b1 sum:4,
update 2 update 4
state
query () history
Merge
a2 sum:6,
Distributed Systems (HansArno Jacobsen)
3 {0,1}
18

Nodes Periodically Propagate Their State
Distributed Systems (HansArno Jacobsen) 19

Selfstudy Questions
Think of a few basic data structures, like lists, sets, counters, binary trees, heaps, maps, etc., and visualize for yourself what happens if replicated instances of these structures are updated via gossip.
For the above data structures, specify merge operations that merge the state of two instances of a given structure.
Assume merge happens periodically, does your replicated structures state converge?
Distributed Systems (HansArno Jacobsen) 20

Distributed Systems (HansArno Jacobsen) 21

CRDT
EVENTUAL CONSISTENCY, MORE FORMALLY
Distributed Systems (HansArno Jacobsen)
22
Pixabay.com

Eventual Consistency (EC)
A replicated statebased object is
eventually consistent if whenever two replicas of the statebased object have the same causal history, they eventually (not necessarily immediately) converge to the same internal state
Distributed Systems (HansArno Jacobsen) 23

Strong Eventual Consistency (SEC)
A replicated statebased object is
strongly eventually consistent if whenever two replicas of the statebased object have the same causal history, they (immediately) have the same internal state
Strong eventual consistency implies eventual consistency
Distributed Systems (HansArno Jacobsen) 24

NoMergeAverage BMergeAverage MaxAverage
EC or SEC
That is the question?
Variants of our Average object, defined next Average
Note that some of these objects do not represent realistic functionality (i.e., needed functionality)
These objects are meant to illustrate convergence concepts only
Distributed Systems (HansArno Jacobsen) 25

Average
a, b attain the same causal history but do not converge to the same internal state they do not converge at all!
state
query history
Neither eventually
b0 sum:0, cnt:0 b1 sum:4, cnt:1
0
consistent, nor
4
1
strongly eventually
b2 sum:10, cnt:3 b3 sum:26, cnt:8
3.3 3.25
0,1
consistent
0,1
Distributed Systems (HansArno Jacobsen)
a0 sum:0, cnt:0
a1 sum:2, cnt:1
a2 sum:6, cnt:2
a3 sum:16, cnt:5 3.2 0,1
0 2 0 3 0,1
26

NoMergeAverage
Objects merge does nothing
All else is the same as for Average
Distributed Systems (HansArno Jacobsen) 28

a, b have same causal history, both converge to a stable but different internal state.
state
query
history
Neither eventually consistent, nor strongly eventually consistent.
a0 sum:0, a1 sum:2, a2 sum:2, a3 sum:2, b0 sum:0, b1 sum:4, b2 sum:4, b3 sum:4,
cnt:0 cnt:1 cnt:1 cnt:1 cnt:0 cnt:1 cnt:1 cnt:1
0 2 2 2 0 4 4 4
0 0,1 0,1 1 0,1 0,129
Distributed Systems (HansArno Jacobsen)

BMergeAverage
Objects merge
At b overwrite state with state at a At a do nothing
All else is the same as for Average
Distributed Systems (HansArno Jacobsen) 30

a, b attain same causal history, both eventually converge to the same internal state eventual consistent.
state
query history
a1, b1 have same causal history but different internal state not strongly eventually consistent
0 0 0 0 0 4 0 0 0
a0 sum:0, cnt:0 a1 sum:0, cnt:0 a2 sum:0, cnt:0 b0 sum:0, cnt:0 b1 sum:4, cnt:1 b2 sum:0, cnt:0
0
Distributed Systems (HansArno Jacobsen)
31

MaxAverage
Objects merge
Pairwise max of sum and cnt
All else is the same as for Average
Distributed Systems (HansArno Jacobsen) 32

At a, b for all states with the same causal history, they have the same internal state strongly eventually consistent.
state
query
history
Great!!! But, what
0,1
does it actually

compute? Here,
1
update(2) overwritten
0,1
by update(4)! Distributed Systems (HansArno Jacobsen)
0,1
a0 sum:0,
a1 sum:2,
a2 sum:4,
a3 sum:4,
b0 sum:0,
b1 sum:4,
b2 sum:4,
b3 sum:4,
cnt:0 cnt:1 cnt:1 cnt:1 cnt:0 cnt:1 cnt:1 cnt:1
0 2 4 4 0 4 4 4

0
0,1
33

Lessons Learned I
Same causal history, different internal state
Same causal history, converge to stable but different internal state
Same causal history, eventually same internal state EC
Same causal history, always same internal state SEC
Average NoMergeAverage BMergeAverage MaxAverage
no no no yes no no yes yes no yes yes yes
C? EC? SEC?
Designing a strongly eventually consistent statebased object with intuitive semantics is challenging!
Distributed Systems (HansArno Jacobsen) 34

Lessons Learned II
Replicatedstatebasedobject
No convergence
Convergence
Eventualconsistencyinthismodel
Strongeventualconsistencyinthismodel
Distributed Systems (HansArno Jacobsen) 35

Selfstudy Questions
Can you design Average such that it becomes EC or SEC as well as offers correct averaging semantics?
Think of other data structures and design update, query, and merge operations with reasonable semantics.
Always draw timelines and state diagrams for your designs and proof EC or SEC, if possible.
Think of data structures that support multiple update operations and one or more query operations.
Distributed Systems (HansArno Jacobsen) 36

Distributed Systems (HansArno Jacobsen) 37

CRDT
CONFLICTFREE REPLICATED DATA TYPES, 2011
Distributed Systems (HansArno Jacobsen) 38
Pixabay.comv

A CRDT is a conflictfree replicated statebased object

CRDTs are no panacea but a great solution when they apply!
ConflictFree Replicated Data Types
A CRDT handles concurrent writes
Intuition restrictions:
Do not allow writes with arbitrary values, limit to write
operations which are guaranteed not to conflict
CRDTs are data structures with special write operations; they guarantee strong eventual consistency and are monotonic (no rollbacks)
Distributed Systems (HansArno Jacobsen) 39

ConflictFree Replicated Data Types CRDTs can be commutative, opbased (CmRDT):

CRDTs can be convergent, statebased (CvRDT):
Example: A max register, which stores the maximum

Therefore, the value of a CRDT depends on multiple write operations or states, not just the latest one`
Example: A growthonly counter, which can only process increment operations
Propagate operations among replicas (duplicatefree, noloss messaging)
value written
Propagate and merge states (idempotent)
Distributed Systems (HansArno Jacobsen) 40

Supports Query
CmCRDTs and CvCRDTs are equivalent. One can be transformed into the other one and vice versa.
Update Merge
Statebased CRDTs
A CRDT is a replicated statebased object
Distributed Systems (HansArno Jacobsen) 41

CRDT Properties
A CRDT is a replicated statebased object that satisfies
Mergeisassociative(e.g.,(A+(B+C))=((A+B)+C)) For any three statebased objects x, y, and z,
merge(merge(x, y), z) is equal to merge(x, merge(y, z)) Mergeiscommutative(e.g.,A+B=B+A)
For any two statebased objects, x and y, merge(x, y) is equal to merge(y, x)
Merge is idempotent
For any statebased object x, merge(x, x) is equal to x
Every update is increasing
Let x be a statebased object and let y = update(x, ) be
the result of applying an update to x
Then, update is increasing if merge(x, y) is equal to y
Distributed Systems (HansArno Jacobsen) 42

max of a, b
self.x = 0 def query(self):
Max Register is a CRDT The statebased object IntMax is a CRDT
IntMax wraps an integer Merge(a, b)is the
class IntMax(object): def __init__(self):
Update(x)adds x to the wrapped integer
return self.x
def update(self, x):
Prove that IntMax is associative, commutative, idempotent, increasing
self.x += x def merge(self,
Distributed Systems (HansArno Jacobsen)
43
assert x >= 0
other):
self.x =
max(self.x,
other.x)

Establish Four Properties of CRDT
Associativity merge(merge(a, b), c)
= max(max(a.x, b.x), c.x)
= max(a.x, max(b.x, c.x))
= merge(a, merge(b, c))
Impotence merge(a, a)
= max(a.x, a.x) = a.x
= a
Commutativity merge(a, b)
= max(a.x, b.x) = max(b.x, a.x) = merge(b, a)
Update is increasing merge(a, update(a, x)) = max(a.x, a.x + x)
= a.x + x
= update(a, x)
Distributed Systems (HansArno Jacobsen) 44

GCounter CRDT Replicated growthonly counter
Internal state of a GCounter replicated on n nodes is an nlength array of nonnegative integers
query returns sum of every element in nlength array add(x)when invoked on the ith node, increments
the ith entry of the nlength array by x
E.g., Node 0 increments 0th entry, Node 1 increments 1st
entry of array, and so on
merge performs a pairwise maximum of the two arrays
Distributed Systems (HansArno Jacobsen) 45

PNCounter CRDT
Replicated counter supporting addition & subtraction
Internal state of a PNCounter
pair of two GCounters named p and n.
p represents total value added to PNCounter
n represents total value subtracted from PNCounter.
query method returns difference p.query() n.query()
add(x)- first of two updates: invokes p.add(x)
sub(x)- second of two updates: invokes n.add(x)
merge performs a pairwise merge of p and n
Distributed Systems (HansArno Jacobsen) 47

GSet CRDT Replicated growthonly set
A GSet CRDT represents a replicated set which can be added to but not removed from
Internal state of a GSet is just a set
query returns the set
add(x)adds x to the set
merge performs a set union
Distributed Systems (HansArno Jacobsen) 49

2PSet CRDT
Replicated set supporting addition and subtraction
Internalstateofa2PSetisa
pair of two GSets named a and r
a represents set of values added to the 2PSet
r represents set of values removed from the 2PSet
query method returns the set difference a.query() r.query()
add(x) is the first of two updates invokes a.add(x).
sub(x)is the second of two updates invokesr.add(x)
merge performs a pairwise merge of a and r Distributed Systems (HansArno Jacobsen) 51

Summary on CRDTs
Formalized and introduced in 2011/2014 CmCRDTs and CvCRDTs are equivalent!
Really neat solution if applicable
Challenge is to design new CRDTs
Distributed Systems (HansArno Jacobsen) 53

Selfstudy Questions
For all CRDTs introduced, establish its four properties.
Create sample execution sequences for each CRDT and
complete a timeline and a state table.
Find use cases where the introduced CRDTs apply and show how they are used.
Think of new CRDTs and repeat the above.
Distributed Systems (HansArno Jacobsen) 1

Distributed Systems (HansArno Jacobsen) 55

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS distributed system data structure CRDT CONFLICTFREE REPLICATED DATA TYPES
$25