[SOLVED] CS ALL 1994 1995 ALL ALL ALL 1994 1995 ALL

$25

File Name: CS_ALL_1994_1995_ALL_ALL_ALL_1994_1995_ALL.zip
File Size: 395.64 KB

5/5 - (1 vote)

Conceptual Modeling

Dimensional Fact Model (DMF)
The Dimensional Fact Model (DFM) is a graphical conceptual model for data mart design aimed at

Copyright By Assignmentchef assignmentchef

support conceptual modeling
create an environment where queries can be formulated in an intuitive
enable requirement specification
represent a reference point for logical modeling
provide project documentation
DFM conceptual schema is a set of fact schemas that model facts, measures, dimensions, and hierarchies

A fact is a concept of interest for the decisional process Usually models a set of events that happen within a company Is dynamic and evolves over time
A measure is a numeric property of a fact and describes a quantitative aspect of interest for the analysis
A dimension is a property of a fact that has a finite domain and
describes an aspect of the fact for the analysis
dimensions

The relationship between measures and dimensions is expressed, at the extensional level (i.e., instance level) through (primary) events
A primary event is a specific occurrence of a fact
It is identified through an ennuple with a value for each dimension It has a value for each measure
A fact is a many-to-many relationship among dimensions
Event for fact SALE:
on 3/4/2019, store X sold 10 units of product Y with a receipt of 25 USD

Dimensional attributes are dimensions and the attributes, with discrete values, describing them
Hierarchy is a directed tree whose nodes are dimensional attributes and whose edges model many-to-one relationships between pairs of dimensional attributes
The dimension is the root of the tree The other nodes are its attributes
productatype productabrand typeacategory
expresses the type of a product (beer is a drink) expresses the brand of a product
each type belongs to a category (drink is food)

Hierarchy: example
dimensional attributes

Each DMF can be modelled as an ER
The fact is a relation
Each measure is an attribute Each dimension is an entity

Secondary event
Given a set of dimensional attributes, each ennuple of values identifies a secondary event
A secondary event aggregates the corresponding primary events
A secondary event has a value for each measure, summarizing the values of
the same measure in the corresponding primary events

Aggregates all the sales in Milan for product X in January 2021 The quantity will be the sum of all the tuples with these values

Descriptive and optional attributes
Descriptive attribute: property of a dimensional attribute or hierarchy Not used for aggregation
Leaf node
Optional attributes:
For some instances the attribute may
not have a value
Telephone number and address are descriptive of a store
Diet is optional for a product

Convergence
Two or more attributes are connected by two or more distinct paths, all representing functional dependencies
storea storeCitya state a country store a saleDistrict a country

Conceptual design
Starts from documentation including ER schema, relational schema,
Identify facts
For each fact
1. Build the attribute tree
2. Identify dimensions
3. Identify measures
4. Draw the fact schema

Identify facts
A fact is a concept of primary interest for the decisional process Typically events that dynamically happen in the company
Considering an ER schema, entities that represent frequently updated archives are good candidate facts
Sales is a good candidate fact Shop is not a good candidate fact

Build the attribute tree
Considering an ER schema, and entity F modeling a fact, the attribute tree is such that
Each vertex is an attribute of the ER schema
The root corresponds to the identifier of F
For each vertex v, the corresponding attribute functionally determines all the attributes represented by descendants of v

Build the attribute tree: example (1)

Build the attribute tree: example (2)

Prune the attribute tree
Remove dimension attributes (single nodes or entire subtrees) that are not relevant for the DW

Identify dimensions
Dimensions determine how events can be aggregated for the decisional process
Dimensions are chosen among the children of the root
The choice of dimensions is crucial as it defines the granularity of
primary events
Time is usually a dimension

Identify measures
Measures are chosen among numeric attributes the children of the root or applying aggregates to other attributes in the tree
A fact can be empty (i.e., with no measure)

Draw the fact schema
The attribute tree is translated into a DFM schema

Logical Modeling

Multidimensional model: Cube (1)
A cube represents a fact
Each of the axis of the cube represents a dimension of the fact Each cell in the cube represents a primary event
A slice of the cube represents a secondary event
storeCity quantity

Multidimensional model: Cube (1)
A cube can have an arbitrary number of dimensions
Cells in a cube can be empty Sparse cube: many empty cells Dense cube: few empty cells
ROLAP logical modeling does not suffer from sparsity issue

Star schema
It includes:
A dimension table DTi for each dimension, with a primary key di and different attributes for each aggregation level
A fact table FT with an external key di for each dimension and an additional attribute for each measure

Star schema: example
category department marketingGroup
unitPrice numberOfCustomers
week month day holiday quarter Year
storeCity state
country salesManager salesDistrict

Snowflake schema
Results from a (partial) normalization of a star schema
Prevents or reduces redundancies in dimension tables
Given a star schema, decompose one or more dimension table DTi in multiple tables DTi,1 DTi,n to remove functional dependencies
Each dimension table DTi,j includes
A primary key di,j
A subset of the attributes that functionally depend from di,j
Possibly external keys for other dimension tables DTi,k to enable the reconstruction of DTi

Snowflake schema: example
week month day holiday quarter Year
CategoryId
category department
marketingGroup
CategoryId
unitPrice numberOfCustomers
brand brandCity TypeId
salesManager salesDistrict cityId
storeCity state country

Snowflake schema vs star schema
Snowflake schema
reduces the storage space
requires to insert additional external keys
facilitates joins including only attributes in primary dimension tables (dimension tables are smaller)
complicates joins involving attributes in secondary dimension tables (higher number of joins)

Dimensions (1)
Dimensions are the core of the multidimensional model
Dimensions are used for Selecting data
Grouping data
Dimensions consist of dimension values Product: milk, cream
Time: 1/1/2019, 2/1/2019
Values are possibly organized in order
Used for comparing values (e.g., income of different months)

Dimensions (2)
Dimensions encode hierarchies with different levels
Dimension values are organized in a tree or in a lattice, whose top element is ALL
Product > Type > Category
Day > Month > Quarter > Year
Granularity of the bottom level is important and has an impact on scalability

Measures (1)
A measure represents a property of a fact that users want to study and analyze
A measure has
Numerical value
Aggregation formula
A measure is additive along a dimension if the SUM operator can be used to aggregate along that dimension

Measures (2)
Additive measures
Additive along all dimensions
Refer to a timeframe, at the end of which they can be evaluated cumulatively Example: number of products sold in a day
Semi-additive measures
Additive only along some dimensions
Evaluated at a particular time
Example: number of products in inventory (not additive on time)
Non-additive measures
Additive along none of the dimensions
Are evaluated at particular times and are relative Example: unit price of a product

Logical design
1. Translate DFM schema into a logical (star or snowflake) schema
2. Materialize views
3. Horizontal or vertical fragmentation of fact table

A view is a fact table storing aggregated data
Useful for the evaluation of query results based on pre-aggregated
Aggregation operators can be
Distributive: can compute aggregates based on pre-aggregates (e.g., sum, min, max)
Algebraic: require a finite number of additional information to compute aggregates from pre-aggregates (e.g., average requires the number of items involved in each pre-aggregate)
Olistic: do not permit to compute aggregates based on pre-aggregates and a finite number of additional information (e.g., median)

Materialization of views (1)
Select a subset of views to be materialized to minimize a pre-defined cost function
Minimize the evaluation of query workload
Minimize the cost of view maintenance, due to updates
Constraints that need to be taken into account
Storage space
Time necessary for updates
Maximum query response time (fixed by final users)
Maximum outdate of data in query response (fixed by final users)

Materialization of views (2)
These minimization objectives and constraints are in contrast
Traditionally, view materialization aims at minimize query response
time under system constraints (i.e., storage space and update time)
The size of the problem grows with the number of dimensional
attributes (each combination of attributes is a possible view)
Solution in two steps
Identify views useful for the query workload
Select the sub-optimal set of views using heuristic algorithms

Materialization of views (3)
It is useful to materialize a view when Directly answers to a frequent query
Permits to answer to several queries
It is not advisable to materialize a view when
Its aggregation pattern is similar to an existing view
Its aggregation pattern is very fine-grained
Materialization does not reduce of one order of magnitude the query evaluation workload

Materialization of views (4)
Materialized views need to be kept up to date
Rebuild the view at each updateanot efficient
Incremental update of the viewsamore efficient
Immediate: views are always consistent, but causes a higher cost
Deferred: lower cost, but views may be inconsistent for a time window

Operations on Multidimensional Data

Basic operations
Slice-and-dice: select and project
Selects a subsets of the cells of the cube
Roll-up: aggregates data
Applies an aggregation function (e.g., sum) over the data in a cube
It can be applied to additive measures, for which the aggregation makes sense along a given dimension
Drill-down: disaggregates data
It is the inverse of roll-up
Adds detail to a cube by disaggregating a measure along one or more dimensions

Slice-and-dice: example
Selects the quantity of pasta sold for trimester for each city
1 trim. 2019
2 trim. 2019
3 trim. 2019
4 trim. 2019

Drill-down: example
1 trim. 2019
2 trim. 2019
3 trim. 2019
4 trim. 2019

1 trim. 2019
2 trim. 2019
3 trim. 2019
4 trim. 2019
Milano-1 Milano-2
70000 60000
65000 60000
40000 70000
75000 70000
Bologna-1 Bologna-2 Bologna-3
Roma-1
Drill-down on storeCity (from city to store)

Roll-up: example
1 trim. 2019
2 trim. 2019
3 trim. 2019
4 trim. 2019

Roll-up on time (from trimester to year)

SQL extensions for OLAP
SQL:2003 extensions for OLAP
Crosstab
GROUP BY extensions ROLLUP
GROUPING SETS

Crosstab (1)
Table reporting the result of different aggregations Efficient for dense datasets
Can be computed using wizards or PIVOT clause
Toothpaste

Crosstab (2)
SELECT City, [Toothpaste] AS Toothpaste, [Soap] AS Soap FROM ( SELECT City, Product, Amount
FROM Sales) ps PIVOT ( SUM (Amount)
FOR Product IN ( [Toothpaste], [Soap]) ) AS pvt
Toothpaste
Toothpaste

Toothpaste

Crosstab (3)
Tabular representation of the results of aggregations
ALL is a dummy value and is the top of the dimensions hierarchy
Compatible with relational model
Use PIVOT to obtain a nicer view
Toothpaste
Toothpaste

Toothpaste

SQL offers three clauses to compute aggregates as extensions to the GROUP BY clause
GROUP BY ROLLUP(a1, , an)
computes aggregates (e.g., SUM, COUNT, AVG, MAX, MIN) at increasing levels of aggregation
GROUP BY [ai1,,ain] CUBE(aj1, , ajm)
computes aggregates at all the possible combinations of aggregations
GROUP BY GROUPING SETS (A1, , An)
computes aggregates according to aggregations defined by the sets of attributes in the clause

sum(Units)
fiat fiat ford
red red red
ROLLUP (1)
select Model, Year, Color, sum(Unit) from T
where Model in {fiat,ford}
and Color = red
and Year between 1994 and 1995 group by rollup(Model, Year, Color)
Roll up from right to left, producing the following aggregations: Model, Year, Color
Model, Year

ROLLUP (2)
fiat fiat ford fiat fiat ford fiat ford ALL
1994 1995 1994 1994 1995 1994 ALL ALL ALL
red red red ALL ALL ALL ALL ALL ALL

sum(Units)
fiat fiat ford
red red red
Partial ROLLUP (1)
select Model, Year, Color, sum(Unit) from T
where Model in {fiat,ford}
and Color = red
and Year between 1994 and 1995 group by Model, rollup(Year, Color)
Model is not part of the roll-up, produces the following aggregations: Model, Year, Color
Model, Year

Partial ROLLUP (2)
fiat fiat ford fiat fiat ford fiat ford
1994 1995 1994 1994 1995 1994 ALL ALL
red red red ALL ALL ALL ALL ALL

Applied to n attributes, generates n+1 aggregations in the order Original groups generated by GROUP BY clause
Sub-totals of attribute an considering the different values of the first n-1
attributes
Sub-totals of an and an-1 considering the different values of the first n-2
attributes
The overall total
Helpful on columns representing the same dimension such as time or
ROLLUP(year, month, day)
ROLLUP(country, region, city)

sum(Units)
fiat fiat ford
red red red
select Model, Year, Color, sum(Unit)
where Model in {fiat,ford}
and Color = red
and Year between 1994 and 1995
group by cube(Model, Year, Color)
Produces all possible roll-up aggregations: Model, Year, Color
Model, Year
Model, Color
Year, Color Model

fiat fiat ford fiat fiat ford fiat ford ALL ALL fiat ford ALL ALL ALL ALL
1994 1995 1994 1994 1995 1994 ALL ALL 1994 1995 ALL ALL ALL 1994 1995 ALL
red red red ALL ALL ALL red red red red ALL ALL red ALL ALL ALL

Applied to n attributes, generates 2n aggregations
Includes all the rows produced by ROLLUP
CUBE is suitable on columns that represent different dimensions For instance, subtotals for month, state, product
Partial CUBE works in a similar way as partial ROLLUP

sum(Units)
fiat fiat ford
red red red
GROUPING SETS (1)
select Model, Year, Color, sum(Unit) from T
where Model in {fiat,ford}
and Color = red
and Year between 1994 and 1995
group by grouping sets((Model,Year,Color),
(Model,Color), (Year,Color))
Produces only the aggregations specified in the clause

GROUPING SETS (2)
Model Year Color sum(Unit)
fiat 1994 red 50 fiat 1995 red 85 ford 1994 red 80 fiat ALL red 135 ford ALL red 80 ALL 1994 red 130 ALL 1995 red 85

Equivalences
CUBE(a,b)aGROUPING SETS((a,b),(a),(b),())
ROLLUP(a,b,c)aGROUPING SETS((a,b,c),(a,b),(a),())
GROUP BY GROUPING SETS ((a,b,c))aGROUP BY a,b,c
GROUP BY GROUPING SETS (a,b,c)a GROUP BY a
UNION ALL GROUP BY b UNION ALL GROUP BY c

GROUPING (1)
select Model, Year, Color, sum(Unit)
grouping(Model) as M,
grouping(Year) as Y,
from T grouping(Color) as C
where Model in {fiat,ford}
and Color = red
and Year between 1994 and 1995
group by rollup(Model, Year, Color)
Returns 1 if the result is a NULL value created by a grouping operation, 0 otherwise
Helps distinguishing between original and aggregated data, especially if NULL value appears in the table

GROUPING (2)
fiat fiat ford fiat fiat ford fiat ford ALL
1994 1995 1994 1994 1995 1994 ALL ALL ALL
red red red ALL ALL ALL ALL ALL ALL
0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1
0 0 0 1 1 1 1 1 1

GROUPING_ID (1)
Returns the decimal number corresponding to the bit vector generated by a GROUPING on all the columns in the aggregation
fiat fiat ford fiat fiat ford fiat ford ALL
1994 1995 1994 1994 1995 1994 ALL ALL ALL
red red red ALL ALL ALL ALL ALL ALL
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 1 1
0 0 0 1 1 1 1 1 1
0 0 0 1 1 1 3 3 7

select Item, Year, Lang, sum(Unit) from T group by cube(Item, Year, Lang)
select Year, Lang, sum(Unit) from T where Item=crime
group by cube(Year, Lang)
select Item, Year, Lang, sum(Unit) from T group by rollup(Item, Year, Lang)
select Item, Year, Lang, sum(Unit) from T
group by grouping sets((Item, Year, Lang), (Item, Lang), (Year, Lang))

Physical Modeling

Physical modeling: phases
Translation of fact schema
Materialization of views
Vertical/horizontal fragmentation
Choice of indexes
Data allocation on physical supports
The first three phases are part of the logical modeling, but strongly interplay with physical modeling

Goal of physical modeling
Optimization of query execution, depending on Available indexes
The specific DBMS
The specific instance of the dataset
Optimizers can operate based on
Rules: considers the data organization, the structure of the query, and indexes Costs: considers an estimation of query evaluation costs based on catalogs

CS: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS ALL 1994 1995 ALL ALL ALL 1994 1995 ALL
$25