Microsoft PowerPoint 11- DatabaseDesign_SchemaRefinement
2018 A. Alawini & A. Parameswaran
Database Design:
Functional
Dependencies
Abdu Alawini
University of Illinois at Urbana-Champaign
CS411: Database Systems
October 8, 2018
1
2018 A. Alawini & A. Parameswaran
Announcements
HW 2 is due Oct 15 (23:59)
Project Track 1 Stage 2
And
Project Track 2 Stage 1
are due on Wednesday (10/10)
Midterm Oct, 29th in class (11-12:15)
Final Dec, 12 in class (11-12:15)
2
2018 A. Alawini & A. Parameswaran
Construct an E-R diagram for a car-insurance company
whose customers own one or more cars each. Each car
has associated with it zero to any number of recorded
accidents.
ERD Exercise
3
2018 A. Alawini & A. Parameswaran
ERD Exercise Student Solution
4
2018 A. Alawini & A. Parameswaran
Various notations for one-to-many
5
0..11..*
zero..one one..many
maximum cardinalities only minimum and maximum
cardinalities
1m
manyone
1 *
0-11+
0
2018 A. Alawini & A. Parameswaran
Various notations for many-to-many
6
1..*1..*
one..many one..many
maximum cardinalities only minimum and maximum
cardinalities
manymany
1+1+
**
mn
2018 A. Alawini & A. Parameswaran
Conceptual design:(ER & UML Models are used
for this.)
What are the entities and relationships we need?
Logical design:
Transform ER design to Relational Schema
Schema Refinement:(Normalization)
Check relational schema for redundancies and
related anomalies.
Physical Database Design and Tuning:
Consider typical workloads; (sometimes) modify the
database design; select file types and indexes.
Overview of Database Design
Were here
7
2018 A. Alawini & A. Parameswaran
Agenda
How do we obtain a good design?
Functional Dependencies and Keys
Rules about Functional Dependencies
Splitting/Combination
Trivial Dependencies
Attribute Closure
FD Closure
8
2018 A. Alawini & A. Parameswaran
Motivation
We have designed ER diagram, and translated it into a relational db
schema R = set of R1, R2, Now what?
We can do the following
implement R in SQL
start using it
However, R may not be well-designed, thus causing us a lot of problems
OR: people may start without an ER diagram, and you need to reformat
the schema R
Either way you may need to improve the schema
9
2018 A. Alawini & A. Parameswaran
Q: Is this a good design?
10Green 123-321-99(201)555-1234
10Green 123-321-99(206)572-4312
431 Purple 909-438-44(908)464-0028
Individuals with several phones:
Address SSNPhone Number
10
2018 A. Alawini & A. Parameswaran
Potential Problems
Redundancy
Update anomalies
maybe well update the address of the person with phone number (206)
572-4312 to something other than 10 Green. Then there will be two
addresses for that person.
Deletion anomalies
delete the phone number of a person; if not careful then the address can
also disappear with it.
Address SSNPhone Number
10Green 123-321-99(201)555-1234
10Green 123-321-99(206)572-4312
431 Purple 909-438-44(908)464-0028
11
2018 A. Alawini & A. Parameswaran
Better Designs Exist
SSNAddress
123-321-9910 Green
909-438-44431 Purple
SSN Phone Number
123-321-99 (201)555-1234
123-321-99 (206)572-4312
909-438-44 (908)464-0028
Break the relation into two:
Unfortunately, this is not something you will detect even
if you did principled ER design and translation
12
2018 A. Alawini & A. Parameswaran
How do We Obtain a Good Design?
Start with the original db schema R
From ER translation or otherwise
Transform it until we get a good design R*
Some desirable properties for R*
must preserve the information of R
must have minimal amount of redundancy
must be dependency preserving
(well come to this later)
must also give good query performance
13
2018 A. Alawini & A. Parameswaran
How do We Obtain a Good Design?
14
2018 A. Alawini & A. Parameswaran
Normal Forms
DB gurus have developed many normal forms
These are basically schemas obeying certain rules
Converting a schema that doesnt obey rules to one that does is
called normalization
This typically involves some kind of decomposition into smaller
tables, just like we saw earlier.
(the opposite: grouping tables together, is called
denormalization)
15
2018 A. Alawini & A. Parameswaran
Normal Forms
DB gurus have developed many normal forms
Most important ones
Boyce-Codd, 3rd, and 4th normal forms
If R* is in one of these forms, then R* is guaranteed to
achieve certain good properties
e.g., if R* is in Boyce-Codd NF, it is guaranteed to not have certain
types of redundancies
DB gurus have also developed algorithms to transform R
into R* in these normal forms
16
2018 A. Alawini & A. Parameswaran
Normal Forms (cont.)
There are also trade-offs among normal forms
Thus, our goal is to:
learn these forms
transform R into R* in one of these forms
carefully evaluate the trade-offs
To understand these normal forms well need to understand
certain types of constraints
functional dependencies and keys
17
2018 A. Alawini & A. Parameswaran
Functional Dependencies
and Keys
18
2018 A. Alawini & A. Parameswaran
Functional Dependencies
A form of constraint (hence, part of the schema)
Finding them is part of the database design
Used heavily in schema refinement
Holds for ALL instances!
Definition:
If two tuples agree on the attributes
A , A , A
1 2 n
then they must also agree on the attributes
B , B , B 1 2 m
Formally:
A , A , A
1 2 n
B , B , B 1 2 m
Where have we seen
this before?
19
2018 A. Alawini & A. Parameswaran
Examples
EmpID Name, Phone, Position
Position Phone
butPhone Position: why?
EmpID Name Phone Position
E0045 Smith 1234 Clerk
E1847 John 9876 Salesrep
E1111 Smith 9876 Salesrep
E9999 Mary 1234 Lawyer
20
2018 A. Alawini & A. Parameswaran
What a FD actually means
Knowing FD: A B holds in R(A, B, C) means that
For ALL valid instances R(A, B, C):
A determines B
Or, if two tuples share A, then they share the same B
This is the property of the world
Conversely, if: A / B, then there is no guarantee that the A
determines B property holds in a given instance (though it
might).
Trivially, it holds when you have only one tuple.
21
2018 A. Alawini & A. Parameswaran
More examples
Product: name, manufacturer price
Person:ssn name, age
Company: name stock price, president
22
2018 A. Alawini & A. Parameswaran
Q: From this, can you conclude phone SSN?
SSN Phone Number
123-321-99 (201)555-1234
123-321-99 (206)572-4312
909-438-44 (908)464-0028
909-438-44 (212)555-4000
No, you cannot. Like we discussed, this is a property of
the world, not of the current data
In general, you cannot conclude from a given instance of
a table that an FD is true. An FD is not an observation
made from a tables current tuples, it is an assertion that
must always be respected.
You can however check if a given FD is violated by the
table instance.
23
2018 A. Alawini & A. Parameswaran
Keys are a type of FD
Key of a relation R is a set of attributes that
functionally determines all attributes of R
none of its subsets determines all attributes of R
There could be many keys of a relation
Student (UIN, email, dept, age)
UIN UIN, email, dept, age
email UIN, email, dept, age
Superkey
Superset of key
a set of attributes that contains a key
Any examples for student?
24
2018 A. Alawini & A. Parameswaran
Many many FDs
MovieInfo (name, year, actor, director, studio)
Same movie can be remade multiple years, but a name, year pair
uniquely determines a movie
A movie has a single director/studio but many actors
Name, year director, studio
Name, year director; Name, year studio
Name, year / actor
A director works only with a single studio
Director studio
An actor works on a given movie only once (never for remakes), but
may work for many movies in a year
Actor, name year; actor, year / name
25
2018 A. Alawini & A. Parameswaran
Many many FDs
MovieInfo (name, year, actor, director, studio)
Name, year director, studio
Name, year director
Name, year studio
Director studio
Actor, name year
Actor, name, year director, studio
Director, actor, name studio, year
Director, name, year studio
Studio, actor, name year
Any missing
FDs?
26
2018 A. Alawini & A. Parameswaran
Rules about Functional
Dependencies
27
2018 A. Alawini & A. Parameswaran
Outline of Rules
Two examples of rules
Splitting/Combination
Trivial Dependencies
Attribute Closure
Algorithm
Uses
FD Closure
A complete set of rules:
Armstrongs axioms
Algorithm
28
2018 A. Alawini & A. Parameswaran
The Splitting/Combining Rule
A1A2An B1B2Bm
Equivalent to:
A1A2An B1;
A1A2An B2;
A1A2An Bm
Can replace one for the other.
29
2018 A. Alawini & A. Parameswaran
Trivial Functional Dependencies
A1A2An A1
In general,
A1A2An B1B2Bm
if {B1B2Bm} {A1A2An}
Example: name, UIN UIN
Why does this make sense?
30
2018 A. Alawini & A. Parameswaran
Closure of a Set of Attributes
Given a set of attributes{A1, , An} and a set of FDs S.
Problem: find all attributes B such that:
for all relations that satisfy S, they also satisfy:
A1, , An B
The closure of {A1, , An}, denoted {A1, , An} ,
is the set of all such attributes B
We will discuss the motivations for attribute closures soon
+
31
2018 A. Alawini & A. Parameswaran
Algorithm to Compute Closure
Split the FDs in S so that every FD has a single attribute on the
right. (Simplify the FDs)
Start with X={A1A2An}.
Repeat until X doesnt changedo:
If (B1B2Bm C) is in S,
such that B1,B2,Bm are in X and C is not in X:
add C to X.
// X is now the correct value of {A1A2An}
+
Why does this algorithm converge?
32
2018 A. Alawini & A. Parameswaran
Example
A BC
A DE
B D
A F B
Closure of {A,B}:
Closure of{A, F}:
Set of attributes A,B,C,D,E,F.
Functional Dependencies:
33
2018 A. Alawini & A. Parameswaran
Example
A BC
A DE
B D
A F B
Closure of {A,B}:X = {A, B, C, D, E}
Closure of{A, F}: X = {A, F, B, D, C, E}
Set of attributes A,B,C,D,E,F.
Functional Dependencies:
34
2018 A. Alawini & A. Parameswaran
1. Decompose all FDs in F
sid -> name, crn -> cid, crn -> subj
2. We start with with the attribute crn in the initial
closure = {crn}
3. We look for all FDs that has crn, the 2nd FD (crn -> cid) has crn
closure = cid U closure = {crn, cid}
4. We look for all FDs that has cid or crn, the 3rd FD (crn -> subj) has
crn
closure = subj U closure = {crn, cid, subj}
Attribute Closure Exercise
35
Given:
Student_info(sid, name, crn, subj, cid, grade), and
F={sid ->name, crn -> cid, subj}, find A+ for A = {crn}
2018 A. Alawini & A. Parameswaran
Is this algorithm correct?
Yes. See Text (Section 3.2.5) for proof.
Two parts of proof:
Anything determined to be part of {S}+ deserves to be there:
soundness
Theres nothing missing:
completeness
36
2018 A. Alawini & A. Parameswaran
Uses for Attribute Closure
Use 1: To test if X is a superkey
How?
compute X+, and check if X+ contains all attrs of R
Use 2: To check if X Y holds
How?
by checking if Y is contained in X+
37
2018 A. Alawini & A. Parameswaran
An exercise
Show that each of the following are not valid rules about
FDs, by giving example relations that satisfy the given
FDs (following the If), but not the FD that allegedly
follows (after the then).
(1) If A > B then B > A
(2) If AB > C and A > C then B > C.
(1) A = SSN, B = Name
(2) A = SSN, B = Phone, C = Name
38
Reviews
There are no reviews yet.