[SOLVED] CS

$25

File Name: CS.zip
File Size: 18.84 KB

5/5 - (1 vote)

A bit of history
1970/1980: Rela%onal databases
Storage is expensive

Copyright By Assignmentchef assignmentchef

Dara are normalized
Data are stored regardless of how they will be used RDBMS become popular
Client/server model
SQL becomes a standard for querying databases
1990: WWW and Internet
2000: Web 2.0
Storage is less expensive
e-Commerce, social mediaadata growaneed to scale with data!

A necessary introduc/on
NoSQL does not mean Not SQL but it is more likely to stand for not relational
NoSQL is now interpreted as Not Only SQL It permits to use SQL-like queries
NoSQL is not a single product but a collection of diverse, and sometimes related, concepts about data storage and manipulation

NoSQL: defini-on
There is not a generally accepted definition in the literature
Characteristics of NoSQL
schemaless
not using only SQL
generally open-source (even though the NoSQL notion is also applied to
closed-source systems)
generally driven by the need to run on clusters (but graph databases do not typically fall in this class)
generally not handling consistency through ACID transactions (but graph databases instead do it)

NoSQL models
There exist different kinds of NoSQL systems, and each family presents different variations
The most important families of NoSQL databases are: Key-value
Document-based Column-oriented Graph-based
Aggregate-oriented Vs. graph-based

Aggregate orienta+on
(partly taken from NoSQL Dis*lled: A Brief Guide to the Emerging World of Polyglot Persistence, P. J. Sadalage, M.Fowler, Addison-Wesley)

Impedance mismatch (1)
Difference between the rela/onal model and the in-memory data structures
In-memory structures are more flexible (e.g., they can be nested)
To use more flexible in-memory structures, it is necessary to translate them
in a rela/onal representa/on
Impedance mismatch more relevant with the development of object- oriented programming languages
Introduc/on of object oriented databasesafailure
Impedance mismatch easier to deal thanks to object-rela/onal mapping frameworks
Not a real solu/on!

Impedance mismatch (2)
Example of single aggregate structure mapped in many rela5onal tables

Scalability issues
Due to the considerable increase in the amount of data to which we assisted in 2000s, scalability is paramount
Vertical scalability: more powerful machinesaexpensive
Horizontal scalability: clustersaless costly and more reliable

Scalability issue: clusters
Clusters are more suitable to the emerging scenarios (e.g., data generated by social networks)
RDBMSs have not been designed to operate on clusters Designed as single-server
Need to think of an alternaDve to RDBMS for data management

Aggregate orienta+on
Intui&on: operate on data in units with a more complex structure than a tuple
Aggregate: a collec&on of related objects to be treated as a unit goal 1: update aggregates atomically
goal 2: communicate with storage in terms of aggregates
Aggregates fit distributed scenarios
natural unit for sharding and replica&on (more on this later)

Aggregate orientation (example)
Applicaon: e-commerce, need to store informaon on customers, products, orders, shipping addresses, billing addresses, payment data
Relational modeling
No data is replicated (normalization) Referential integrity (foreign key
constraints)
DBMS cannot use knowledge of the aggregates for storage

Aggregate orientation (example)
Application: e-commerce, need to store information on customers, products, orders, shipping addresses, billing addresses, payment data
JSON format (excerpt)
Two aggregates: customer and order Customer contains addresses
Order contains payments that contains addresses

To aggregate or not to aggregate?
Data are organized depending on how they will be accessed
Aggrega6on is not a property of the data, but of how data will be used by applica6ons
focus on the unit of interac6on with the storage
Not always a good idea:
A given aggregate structure can be an obstacle with a given applica6on (-) Fits well with opera6ons on clusters (+)

CAP theorem (1)
Aggregate-oriented databases and ACID proper3es are not a good match
CAP theorem (E. Brewer, 2000)
Of three proper+es of shared-data systems (Consistency, Availability and tolerance to network Par++ons) only two can be achieved at any given moment in +me.

CAP theorem (2)
Consistency
Availability
Par11on Tolerance

CAP theorem (3)
Consistency
Every request receives the correct response
Once data has been wri7en, all future read requests operate on this version of the data
Availability
The data are available and responsive
Each request eventually receives a response
If you can access a node in the cluster, it can read and write data
Tolerance to network par??ons
The cluster can survive to par??oning of the network that break the cluster in mul?ple par??ons that cannot communicate with each other

CAP theorem: example (1)
Two nodes with a replica of the data, having value V0 ini8ally
N1 runs algorithm A wri8ng a new value V1 N2 runs algorithm B reading the data value

CAP theorem: example (2)
Ideal immediate propagation

CAP theorem: example (3)
In real world scenarios, propaga1on is not immediate!

CAP theorem: example (4)
If we want a system
highly available
composed of a large number of nodes
where each node resists to network problems
then we have to accept that some9mes N1 might see V1
N2 might see V2

CAP theorem: how to deal with it? (1)
Renounce to tolerance to network par//ons
Single server CA system
Resistance to network par77oning not requested A single machine cannot par77on
CA cluster
If a par77on ever happens, all the nodes in the cluster go down (no impact on
the CAP theorems defini7on of Availability) Hard to guarantee
Operate on a single node

CAP theorem: how to deal with it? (2)
Renounce to consistency or availability
Solution adopted by NoSQL systems
They trade-off between consistency and availability
Not always a Boolean decision, trade a little consistency for a little availability How much you can trade depends on the specific application domain
Book a room via hotel booking web site replicated at two locations C:bothnodesagreeontheserializationofrequests
NetworkpartitioningwouldcompromiseA
IncreaseA:master-slaveapproach
FurtherincreaseA:stillacceptbookinglocally,resolveincaseoflastroom

CAP theorem: how to deal with it? (3)
NoSQL is a varied world
In general aggregate-oriented databases support atomic manipula:on of a
single aggregate at a :me
Atomic manipula:on of mul:ple aggregatesamanaged in the applica:on
Considera:ons on where atomicity is wanted is part of the strategy to divide data into aggregates

CAP theorem: conclusions
Be#er think about the tradeoff consistency and latency
We can always improve consistency by involving more nodes, but
each node increases the response p1.ID

Relationships and aggregate-orientation
Most NoSQL databases store sets of disconnected documents, values, columns
Difficult to use them for connected data and graphs.
Possible solu?onaembed an aggregates iden?fier inside the field belonging to another aggregate
Similar to foreign keys
Joins s?ll at the applica?on level, expensive
The Riak key-value store allows stored values to be augmented with link metadata, but suitable only for simple graph-structures

Relationships and aggregate-orientation
Direction of links between aggregates
Bobs friends: relatively easy
Who is friends with Bob? Needs to scan the entire database
Possible solutionaadding backward links increased write latency
increased storage cost
All these solutions are implementing a graph structure atop a nonnative store
some of the benefits of partial connectedness, but at substantial cost

Relationships and LPGs
Base idea behind graph databases is to treat connected data as connected data
Connec2ons in domains correspond to connec2ons in data
It is possible to add graphs to increase knowledge
Example: add

CS: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS
$25