[SOLVED] CS Information Management

$25

File Name: CS_Information_Management.zip
File Size: 235.5 KB

5/5 - (1 vote)

Information Management

Universita degli Studi di Databases (slides partially provided by Prof. Samarati)

Copyright By Assignmentchef assignmentchef

Paradigms for data distribution
Client-server architectures
Separation between the database server and the client
Distributed databases
Several database servers used by the same application
Parallel databases
Several storage devices and processors that operate in parallel to improve
performance
Replicated databases
Same data physically stored at different servers
Data warehouses
servers specialized for the management of data dedicated to decision support

Kinds of architectures
Two kinds of systems
OLTP (On-Line Transaction Processing)
for the optimized management and reliable transactions on database server
specialized on the management of hundreds or even thousands of transactions per second
OLAP (On-Line Analytical Processing)
for data analysis
operate on data warehouse servers, specialized on the management of data for decision support systems
Server typically include both OLAP and OLTP functionalities

Properties of distributed systems
Portability
possibility of transporting programs from an environment to another established at compile time
facilitated by language standards (e.g., SQL-2, SQL-3)
Interoperability
ability of interacting among heterogeneous systems
established at compile time
facilitated by standard protocols for data access (e.g., Database Connectivity (ODBC) e X-Open Distributed Transaction Processing (DTP))

Client-server architectures (1)
Interacting software processes are divided between client (request services) and server (provide services)
Requires a precise definition of service interface, listing precisely the services offered by the server
The client has an active role (makes the request)
The server has a reactive role (replies)
Usually, a client process requests a few services in a sequence to one or more server processes
Usually, a server process replies to multiple requests by different client processes

Client-server architectures (2)
The machine acting as client should
be adequate for interacting with the final user
support productivity tools (email, word processing, Internet access, and workflow management)
The machine acting as server should
have considerable memory capacity (to support buffer management) have considerable disk capacity (to store the whole database)

Client-server architectures (3)
Widely used for databases:
Client and server functions are well defined
Provides a good separation of the management activities client suitable for interaction with the user
server suitable for data management
SQL offers an ideal programming paradigm for the identification of service interfaces

Client-server architectures (4)
SQL offers an ideal programming paradigm for the identification of services interfaces
SQL queries formulated by the client and sent to the server
query results are computed by the server and sent to the client
SQL standardization, portability, and interoperability enable the development of client applications that involve different servers

Client-server architectures (5)
Often the server is multi-threaded:
Behaves as a single process that dynamically operates for different
transactions
Each execution unit of the server process for a given transaction is a thread

Client-server architectures (6)
Servers are processes that are permanently active and check: an input queue for clients requests
an output queue for query results
Often, a dispatcher process distributes requests among servers and returns replies to clients
When dispatchers can dynamically determine the number of active server processes depending on the number of requests, we say that we have a class of servers

Client-server architectures (7)
Output queue
Input queue
Database Server
Server Process

Two-tiered architectures (1)
There are two machines: a client and a server
There are different solutions for deciding to which machine each layer
is assigned
Simple approach:
the client has the user-interface level
the server has the processing and data levels
This is not the only solution

Two-tiered architectures (2)

Fat vs thin client
Fat Clients
+ More efficient for users
+ Provide higher system scalability
More complex and hard to manage
Client software is more error prone and depends on the client machine and its
operating system Thin Clients
+ Easier to manage
Trend to move computation to the client

Multi-tiered architectures
A single server is replaced by multiple servers running on different machines
A server may act as a client
Different programs in the processing level reside on different servers

Distributed databases
System of databases where at least one client interacts with more servers for the execution of an application

Distributed databases: Pros
Respond to applications requirements
organizations have a distributed structure
data distribution permits the management of data where they are generated and used
Flexibility and modularity
can be configured with continuous additions and modifications of
components
Reliability
can react to failures reducing performance instead than stopping operability

Distributed databases: classification (1)
Kind of involved DBMSs
Homogeneous DDB: all the servers use the same DBMS Heterogeneous DDB: the servers use different DBMSs
Kind of network
Local Area Network (LAN) Wide Area Network (WAN)

Distributed databases: classification (2)
Kind of DBMS
Kind of network
Homogeneous
Management and financial applications
Booking systems and financial applications
Heterogeneous
Management and inter-functional applications
Integrated booking systems and inter- banking systems

Local independency and cooperation
The distributed database can be considered, from an abstract point of view, as a single database
It should be designed in such a way to have applications that are executed on a single server, minimizing
the need of interaction
the need of data exchange

Data fragmentation (1)
Adopts algebraic operations on a relation R to divide it into fragments R1, , Rn
Horizontal fragmentation
each Ri has a subset of the tuples in R
each Ri can be interpreted as the result of a selection over R
Vertical fragmentation
each Ri has, in its schema, a subset of the attributes in R
each Ri can be interpreted as the result of a projection over R

Data fragmentation (2)
Correctness property
completeness: each data in R must be represented in one of its fragments Ri
reconstructability: R should be fully reconstructable starting from its fragments
Horizontal fragments are disjoint
do not have common tuples
Vertical fragments include the primary key of R guarantees reconstructablility

Horizontal fragmentation: example (1)
EMPLOYEE(Empnum,Name,Deptnum,Salary,Taxes)
Fragments:
EMPLOYEE1 = Empnum3 EMPLOYEE EMPLOYEE2 = Empnum>3 EMPLOYEE
To reconstruct the relation:
EMPLOYEE = EMPLOYEE1 E EMPLOYEE2

Horizontal fragmentation: example (2)

Production Administration Production Marketing
3.7M 3.5M 5.3M 3.5M
1.2M 1.1M 2.1M 1.1M
EMPLOYEE1 (Empnum3 EMPLOYEE)
Administration Production
3.7M 3.5M 5.3M
1.2M 1.1M 2.1M
EMPLOYEE2 (Empnum>3 EMPLOYEE)

Vertical fragmentation: example (1)
EMPLOYEE(Empnum,Name,Deptnum,Salary,Taxes) Fragments:
EMPLOYEE1 = OEmpnum,Name(EMPLOYEE)
EMPLOYEE2 = OEmpnum,Deptnum,Salary,Taxes(EMPLOYEE)
To reconstruct the relation:
EMPLOYEE = EMPLOYEE1 EMPLOYEE2

Horizontal fragmentation: example (2)

Production Administration Production Marketing
3.7M 3.5M 5.3M 3.5M
1.2M 1.1M 2.1M 1.1M
EMPLOYEE1 = OEmpnum,Name(EMPLOYEE) EMPLOYEE2 = OEmpnum,Deptnum,Salary,Taxes(EMPLOYEE)
EMPLOYEE1 EMPLOYEE2

Production Administration Production Marketing
3.7M 3.5M 5.3M 3.5M
1.2M 1.1M 2.1M 1.1M

Allocation schema
Describe the mapping of relations or fragments to servers where they are stored
Each fragment corresponds, at the physical level, to a file and is allocated to a specific server
fragments are stored
the original relation is a view over fragments (virtual)
Allocation can be
non-redundant: each fragment or relation is allocated to one server only
redundant: at least one fragment or relation is allocated to multiple servers

Transparency levels (1)
Distinguishing between fragmentation and allocation permits to write applications operating at different levels
From the most abstract and independent from data fragmentation To the most concrete and dependent on their physical allocation

Transparency levels (2)
fragmentation: the programmer
does not need to know the fragmentation does not need to know the allocation
allocation: the programmer
needs to know the structure of fragments does not need to know the allocation
language: the programmer
needs to know the structure of fragments needs to know the allocation
no transparency
each DBMS accepts its own SQL dialect: the system is heterogeneous and the DBMSs do not support a common interoperability standard

Transparency levels: example
SUPPLIER(Snum,Name,City)
horizontal fragments:
SUPPLIER1 = city=Milan (SUPPLIER) SUPPLIER2 = city=Rome (SUPPLIER)
allocation of horizontal fragments (with replication):
write a procedure that, given a supplier number, returns its name

Fragmentation transparency: example
The programmer
does not need to know the fragmentation does not need to know the allocation
procedure Query1(:snum,:name); select Name into :name from Supplier
where Snum = :snum;
end procedure;

Allocation transparency: example
The programmer
needs to know the structure of fragments
does not need to know the allocation
in case of redundancy does not need to indicate which copy to use for access (replication transparency)
procedure Query2(:snum,:name); select Name into :name from Supplier1
where Snum = :snum;
if :empty then
select Name into :name
from Supplier2
where Snum = :snum;
end procedure;

Language transparency: example
the programmer
needs to know the structure of fragments
needs to know the allocation
in case of replication, needs to indicate which copy to use for access
procedure Query3(:snum,:name); select Name into :name
from where Snum = :snum;
if :empty then
select Name into :name
from where Snum = :snum;
end procedure;

Query optimization
The application can be optimized through:
parallelism
submit requests in parallel in contrast to in sequence reduce the overall response time
knowledge of the logical properties of fragments query the fragment where data reside
increase efficiency but reduce flexibility

Query optimization: example
procedure Query4(:snum,:name,:city); case :city of
select Name into :name from Supplier1
where Snum = :snum;
select Name into :name from Supplier2
where Snum = :snum;
end procedure;

Fragmentation: exercise
SHOP (ID, Name, Kind, Tel, City, IDWarehouse) WAREHOUSE (ID, Street, City, Tel)
WAREHOUSE is split in 4 fragments based on the city where warehouses are located (Milan, Florence, Rome, Venice), allocated at the three corresponding shipment centres in the same cities
SHOP is split in 3 fragments based on the kind of shop (clothes, shoes, bath) allocated at three coordination centres, one for each kind of shop, in Cagliari, Siena, and Bari, respectively

Fragmentation: exercise (continuation)
Write a query that retrieves the name, kind, and telephone number of the shop with ID=123 according to fragmentation, allocation, and language transparency.

CS: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS Information Management
$25