Concurrency control (1)
Permits the concurrent execution different transactions
Crucial in information systems with high load banks,financialinstitution,airlinebooking
Permits to efficiently use the DBMS
Copyright By Assignmentchef assignmentchef
Maximizing the number of served transactions Minimizing response times
Application load measured in transactions per second (tps) Typical values: from tens to thousands
Concurrency control (2)
Mediates access requests to the data
Decides to grant or deny requests
Establishes the order of accesses (scheduler)
Concurrency control (3)
For the sake of lectures, we consider two simplifying assumptions:
Databases in terms of abstract objects x, y, z with numerical values
Read/write operations on object x, as read/write of the whole page where x is stored
Concurrency control: architecture
read, write begin, commit, abort Lock table
read, write
(not all and possibly in different order)
Access methods manager
Transaction manager
Concurrency manager
Secondary storage manager
Concurrency control (4)
The concurrent execution of different transactions can cause correctness issues and anomalies
Update loss
Dirty read
Inconsistent reads Ghost update
Ghost insert
Update loss
Updates by a transaction lost because overwritten by a concurrent transaction
Transaction t1
x := x + 1
w1(x) commit
Transaction t2
x := x + 1
w2(x) Initial value: 2
commit Final value: 3 instead of 4
(the update by t1 is lost)
Dirty read
A transaction reads the intermediate result of another transaction that then aborts (whose modification is then cancelled)
Transaction t1
x := x + 1 w1(x)
Transaction t2
x := x + 1
w2(x) commit
t2 reads an intemediate state which is then cancelled
Inconsistent reads
A transaction reads objects that another transaction is modifying: some read operations are before, others are after the updates
Transaction t1
bot r1(x) w1(y)
r1(x) w1(z) commit
Transaction t2
x := x + 1 w2(x) commit
t1 reads different values for x
Ghost update
A transaction observes only a subset of the effects of another transaction (observing a status of the data that does not satisfy
integrity constraints)
Transaction t1 bot
Transaction t2 bot
y := y 100 r2(z)
z := z + 100 w2(y)
w2(z) commit
r1(x) r1(y)
Constraint: x+y+z=1000;
s=1100 but the constraint is satisfied
s := x + y + z commit
Ghost insert
A transaction evaluates twice an aggregated value about a set of elements in a selection condition
select avg(Mark) from Exam
where Subject=IM
If a new tuple is inserted between an evaluation and the subsequent one, the results may be different
The anomaly cannot be prevented referring only to already known data
Concurrency control theory
Formally a transaction
is a sequence of read and write operations
has a transaction identifier assigned by the system
starts with begin transaction and ends with end transaction (omitted)
t1: r1(x) r1(y) w1(x) w1(y)
Sequence of input/output operations presented by concurrent transactions
all the operations of each transaction that committed must appear in the schedule
for each transaction, operations must appear in the schedule in the same order as in the transaction
For the study:
we consider the commit-projection and ignore transactions that
(simplifying assumption, not acceptable in practice, will be removed afterwards)
Schedule: example
Transactions
t1 : r1(x) w1(x) r1(y) w1(y) t2 : r2(y) w2(y)
Some possible schedules
S1 : r1(x) w1(x) r1(y) w1(y) r2(y) w2(y) S2 : r2(y) w2(y) r1(x) w1(x) r1(y) w1(y) S3 : r1(x) r2(y) w1(x) r1(y) w2(y) w1(y) S4 : r2(y) r1(x) w1(x) r1(y) w1(y) w2(y)
Concurrency control
Avoids schedules causing anomalies
Managed by a module that accepts or refuses operations requested
by transactions (scheduler)
Based on the identification of classes of acceptable schedules based on definitions of equivalence
serial schedule
serializable schedule
Serial and serializable schedules
Serial schedule
For each transaction ti in the schedule, all the operations in ti are
executed consecutively
n transactions n! possible serial schedules
Serializable schedule
non-serial schedule that produces the same result as a serial schedule of the same transactions
need definition of equivalence among schedules
progressive concepts: view-equivalence, conflict-equivalence, two-phase locking, timestamp-based
Serial schedule: example
Transactions
t1 : r1(x) w1(x) r1(y) w1(y) t2 : r2(y) w2(y)
Serial schedules:
S1 : r1(x) w1(x) r1(y) w1(y) r2(y) w2(y) S2 : r2(y) w2(y) r1(x) w1(x) r1(y) w1(y)
Possible schedules:
S3 : r1(x) r2(y) w1(x) r1(y) w2(y) w1(y) S4 : r2(y) r1(x) w1(x) r1(y) w1(y) w2(y)
View-serializability (1)
Preliminary definitions
ri(x) reads-from wj(x) in a schedule S if:
wj(x) precedes ri(x) in S
there is no wk(x) between wj(x) and ri(x) in S
wi(x) is a final write in a schedule S if: itisthelastwriteonxinS
S: r1(x) w1(x) w1(y) r2(x) w2(y) r2(x) read-from w1(x)
w1(x) final write on x
w2(y) final write on y
View-serializability (2)
View-equivalence
two schedules Si and Sj (with i = j) are view-equivalent (Si ov Sj) if they have
the same read-from relations the same final writes
View-serializability
a schedule is view-serializable if it is view-equivalent to a serial schedule
We denote with VSR the set of view-serializable schedules
View-serializability: example (1)
S: w0(x) r2(x) r1(x) w2(x) w2(z)
View-serializability: example (1)
S: w0(x) r2(x) r1(x) w2(x) w2(z)
read-from:
r2(x) w0(x)
r1(x) w0(x)
final write: x: w2(x) z: w2(z)
view-equivalent to the serial schedule w0(x) r1(x) r2(x) w2(x) w2(z)
view-serializable
View-serializability: example (2)
S: w0(x) r1(x) w1(x) r2(x) w1(z)
View-serializability: example (2)
S: w0(x) r1(x) w1(x) r2(x) w1(z)
read-from:
r1(x) w0(x)
r2(x) w1(x)
final write: x: w1(x) z: w1(z)
view-equivalent to the serial schedule w0(x) r1(x) w1(x) w1(z) r2(x)
view-serializable
View-serializability: example (3)
S: r1(x) r2(x) w2(x) w1(x)
View-serializability: example (3)
S: r1(x) r2(x) w2(x) w1(x)
read-from: r1(x)
final write: x: w1(x)
not view-serializable (loss of update)
View-serializability: example (4)
S: r1(x) r2(x) w2(x) r1(x)
View-serializability: example (4)
S: r1(x) r2(x) w2(x) r1(x)
read-from: r1(x)
r1(x) w2(x) final write:
x: w2(x)
not view-serializable (inconsistent read)
View-serializability: example (5)
S: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)
View-serializability: example (5)
S: r1(x) r1(y) r2(z) r2(y) w2(y) w2(z) r1(z)
read-from: r1(x) r1(y) r2(z)
r1(z) w2(z)
final write: y: w2(y) z: w2(z)
not view-serializable (ghost update)
VSR: exercise
w1(x) r1(x) w2(y) r3(x) w1(t) r2(y) r3(t) w3(x) w2(x) r2(t)
View-serializability: complexity
Deciding if two schedules are view-equivalent: polynomial cost
Deciding view-serializability of an arbitrary schedule: NP-hard problem
it is necessary to compare the schedule with all the possible serial schedules
Note: view-equivalence not usable in practice due to its complexity It is necessary to define more restrictive but practicable conditions
Conflict between operations
Two operations ai and aj (i = j) are in conflict if they belong to two different transactions
they access the same object
at least one is a write
read-write conflicts (rw o wr) write-write conflicts (ww)
Conflict: example
t1: r1(x) r1(y) w1(x) w1(y) t2: r2(y) r2(x) w2(y)
Conflicts:
w1(x), r2(x) w1(y), r2(y) w1(y), w2(y) r1(y), w2(y)
Conflict-serializability
Conflict-equivalence
two schedules Si and Sj (i = j) are conflict-equivalent (Si oC Sj) if
they include the same operations
each pair of conflicting operations appear in the same order in both schedules
Conflict-serializability
a schedule is conflict-serializable if it is conflict-equivalent to a serial schedule
We use notation CSR to denote the set of conflict-serializable schedules
Conflict-serializability: example
S: w0(x) r1(x) w0(z) r1(z) r2(x) r3(z) w3(z) w1(x)
Conflict-serializability: example
S: w0(x) r1(x) w0(z) r1(z) r2(x) r3(z) w3(z) w1(x)
conflicts:
w0(x), r1(x)
w0(x), r2(x) w0(x), w1(x) w0(z), r1(z) w0(z), r3(z) w0(z), w3(z) r1(z), w3(z) r2(x), w1(x)
Conflict-equivalent to serial schedule w0(x) w0(z) r2(x) r1(x) r1(z) w1(x) r3(z) w3(z)
Conflict-serializability: conflict graph
Could be evaluated building the conflict graph:
each transaction is represented by a node
for each pair ai and aj of operations in conflict such that ai precedes aj in the schedule, there is an edge from ti (transaction of ai) to tj (transaction of aj )
A schedule is conflict-serializable if and only if the graph is acyclic: it is conflict-equivalent to all the topological orders of the graph
Conflict graph: example
S: w0(x) r1(x) w0(z) r1(z) r2(x) r3(z) w3(z) w1(x)
conflicts:
w0(x), r1(x)
w0(x), r2(x) w0(x), w1(x) w0(z), r1(z) w0(z), r3(z) w0(z), w3(z) r1(z), w3(z) r2(x), w1(x)
Conflict graph: example
S: w0(x) r1(x) w0(z) r1(z) r2(x) r3(z) w3(z) w1(x)
conflicts:
w0(x), r1(x)
w0(x), r2(x) w0(x), w1(x) w0(z), r1(z) w0(z), r3(z) w0(z), w3(z) r1(z), w3(z) r2(x), w1(x)
Conflict-equivalent to serial schedule w0(x) w0(z) r2(x) r1(x) r1(z) w1(x) r3(z) w3(z)
CSR vs VSR
CSR I VSR
SICSRSIVSR SIVSRSICSR
SICSRdonotknowifVSR SIVSRdonotknowifCSR
CSR vs VSR: example
t1 : r1(x) w1(x)
t2 : w2(x) blind write t3 : w3(x) blind write
S : r1(x) w2(x) w1(x) w3(x) S is view-serializable
view-equivalent to t1, t2, t3 S is not conflict-serializable
Conflict-serializability: complexity
Deciding if a schedule is conflict-serializable has linear cost in the size of the graph
However, it is practically too expensive, especially in case of distributed data
The mechanisms adopted in practice are based on more restrictive, but also more efficient, conditions
VSR-CSR: exercise (1)
r1(x) w2(y) w2(x) w1(y) r3(x) r3(y) w3(x) w3(y)
VSR-CSR: exercise (2)
w2(x) r2(y) r1(x) r3(y) r2(z) w3(x) w4(y) w1(z) w4(z)
VSR-CSR: exercise (3)
r1(x) w1(t) r3(y) w3(y) w1(x) w1(y) r2(t) r3(z) w3(t) r2(x) w2(z) w4(t)
VSR-CSR: exercise (4)
w2(y) r3(z) w2(x) r1(y) w3(z) r4(x) w1(z) w3(x) w1(x)
Concurrency control
Systems used in practice do not evaluate serializability a-posteriori, but operate in such a way to guarantee such a property
Possible techniques:
two phase locking
timestamp
mono-version multi-version
Variable describing the status of data wrt the operations that can be executed over it
two states (binary) lock unlocked (0)
locked (1)
three states lock unlocked
r_locked (read, shared)
w_locked (write, exclusive)
Locking: two states
To access data, the transaction must ask a lock on the data
The transaction releases the lock (unlock) when it does not need the
data anymore
A transaction can access data only if it obtained the corresponding lock
A transaction that follows these rules is said to be well formed according to locking
Locking: three states
To access data for reading it, the transaction must ask a r_lock on the data (shared)
To access data for writing it, the transaction must ask a w_lock on the data (exclusive)
The transaction releases the lock (unlock) when it does not need the data anymore
A transaction can access data only if it obtained the corresponding lock
A transaction that follows these rules is said to be well formed according to locking
Lock management
Module that acts as lock manager
Receives lock requests
Grants or denies lock requests based on conflict tables
If the request is denied, the requesting transaction is in wait status until the
lock is granted
Properly modifies lock tables (status and possibly counter)
Conflict table: two states lock
Conflict table: three states lock
OK (c=c+1) r_locked
no* r_locked
OK (c=c-1) c=0: unlocked c>0: r_locked
*: unless the case of upgrade of an exclusive r_lock
Two phase locking
A transaction, after releasing a lock, cannot acquire other locks
Each transaction has two phases Increasing (acquires locks)
Decreasing (releases locks) Guarantees serializability
Resource requests
increasing phase
decreasing phase
2PL schedule
The schedules produced by a scheduler using
well formed transactions according to locking conflict-based lock management
two phase locking
belong to the 2PL class
2PL vs CSR vs VSR
2PL I CSR I VSR
SI2PLSICSR SICSRSI2PL
SI2PLdonotknowifCSR SICSRdonotknowif2PL
2PL vs CSR: example
S: r1(x) w1(x) r2(x) w2(x) r3(y) w1(y) conflict-serializable
conflict-equivalent to t3, t1, t2 not 2PL
2PL: exercise (1)
r1(x) w1(z) w2(t) r3(y) r4(y) w3(t) r1(z) w2(x) r4(x) w4(y)
2PL: exercise (2)
r1(x) w3(t) r2(y) r2(z) w2(y) w1(z) r3(k) w2(t) w1(y)
Strict two phase locking
The scheduler operates without knowing how transactions will end, hence:
we need to remove the hypothesis of using a commit-projection we add a restriction to 2PL
Strict 2PL (used by commercial DBMSs)
a transaction, after releasing a lock, cannot acquire other locks
the locks of a transaction can be released only after commit/abort operations
Variations over 2PL
Base 2PL
a transaction, after releasing a lock, cannot acquire other locks
Strict 2PL (prevents dirty reads) base 2PL
a transaction can release locks only after commit/abort
Conservative 2PL (guarantees absence of deadlock) base 2PL
a transaction must acquire all the locks before starting executing operations
Concurrency control
Systems used in practice do not evaluate serializability a-posteriori, but operate in such a way to guarantee such a property
Possible techniques:
two phase locking
timestamp
mono-version
multi-version
Identifier that defines a total order among the temporal events of the system
the timestamp of an event is greater than the timestamps of the events that preceded it
no two events have the same timestamp
Possible implementations: counter
system clock
Transactions and timestamp
Each transaction is associated with a timestamp (ts)
assigned by the system at the beginning of the transaction represents the time at which the transaction started
A schedule is accepted only if it it reflects the serial order of transactions based on the timestamp value of the transactions
Timestamp (1)
Each object x has two indicators:
RTM(x): max timestamp among transactions that have read x
WTM(x): max timestamp among transactions that have written x
Timestamp (2)
Answer of the scheduler to requests
read(x, ts)
ts < WTM(x) a request denied, transaction aborted Otherwisearequest accepted; RTM(x) := max(RTM(x),ts) write(x, ts) ts < WTM(x) or ts < RTM(x)arequest denied, transaction aborted Otherwisearequest accepted; WTM(x) := tsTimestamp: examplewrite(x,8)write(x,11)read(x,10) Timestamp: examplewrite(x,8)no (t8 killed)write(x,11)read(x,10)no (t10 killed) Timestamp: exercisewrite(x,7)read(x,10)write(x,8)read(x,13)read(x,11)Concurrency control with timestamps there is no lock request on resources free from deadlock may kill many transactions behaves correctly only under the commit-projection assumption to remove the assumption it is necessary to buffer write operations (write on persistent memory only after commit) read operations of buffer data wait for commit of the writing transaction Timestamp and multiversionEach write operation of an object generates a new copy (version) of the same read operations requested by old transactions are executed on old versions (instead of being denied and transactions aborted)For each object x: there are multiple versions xi each with its own RTM and WTM each write operation generates a new version each read operation operates on the version created by the most recent write operation that precedes the read operation Transactions and timestamp with multiversionAnswer of the scheduler to requests read(x, ts) request always accepted read version xk such that WTM(xk) = max{WTM(xi) | WTM(xi) ts } RTM(xk):= max(RTM(xk),ts) write(x, ts) find version xj such that WTM(xj) = max{WTM(xi) | WTM(xi) ts } ts < RTM(xj)arequest denied, transaction aborted Otherwisearequest accepted; create new version xk and RTM(xk):=tsWTM(xk):=ts Timestamp and multiversion: examplewrite(x,7)write(x,12)read(x,14)write(x,13)write(x,10)read(x,11)Timestamp and multiversion: examplewrite(x,7)no (t7 killed)write(x,12)read(x,14)write(x,13)no (t13 killed)write(x,10)read(x,11)Timestamp and multiversion: exercisewrite(x, 7)read(x, 15)write(x, 10)read(x, 8)write(x, 18)read(x, 6)write(x, 16)TSvs2PLvsCSRvsVSR SITSSICSR SICSRSITS SITSdonotknowifCSR SICSRdonotknowifTS transactions with denied actionskilled and restarted serialization orderimplied by conflicts implied by timestamp wait for commitstrict 2PLwrite buffering It is more expensive to re-execute transactions than put them in wait 2PL is better CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.