MONASH
INFORMATION TECHNOLOGY
Normalisation
Data Normalisation
Relations should be normalised in order to avoid anomalies which may occur when inserting, updating and deleting data.
Normalisation is a systematic series of steps for progressively refining the data model.
A formal approach to analysing relations based on their primary key (or candidate keys) and functional dependencies.
Used:
as a design technique bottom up design, and
as a way of validating structures produced via top
down design (ER model converted to a logical model see next session)
2
Sample Data
* against EMP_NAME indicates the project leader
3
Problems with sample data
JOB_CLASS invites entry errors eg. Elec. Eng. vs Elect. Engineer vs E.E. Table has redundant data
Details of a charge per hour are repeated for every occurrence of job class
Every time an employee is assigned to a project emp name repeated Relations that contain redundant information may potentially suffer from
several update anomalies
Types of update anomalies include:
Insert Anomaly
Insert a new employee only if they are assigned to a project
Delete Anomaly
Delete the only employee assigned to a project? Delete the only employee of a particular job class?
Modification (or update) Anomaly
Update a job class hourly rate need to update multiple rows
4
The Normalisation Process Goals
Creating valid relations, i.e. each relation meets the properties of the relational model. In particular:
Entity integrity
Referential integrity
No many-to-many relationship
Each cell contains a single value (is atomic).
In practical terms when implemented in an RDBMS:
Each table represents a single subject
No data item will be unnecessarily stored in more than one table.
The relationship between tables can be established (pair of PK and FK is identified).
Each table is void of insert, update and delete anomalies.
5
Representing a form as a relation
This process follows a standard approach:
arrive at a name for the form which indicates what it represents (its
subject)
determine if any attribute is multivalued (repeating) for a given entity
instance of the forms subject
if an attribute (or set of attributes) appears multiple times then the group of related attributes need to be shown enclosed in brackets to indicate there are multiple sets of these values for each instance
Looking at our SAMPLE DATA
Name: EMPLOYEE_PROJECT_ASSIGNMENT simplify name to ASSIGNMENT for lecture
ASSIGNMENT (proj_num, emp_num, emp_name, job_class, chg_hour, assign_hours)
i.e. the form consists of repeating rows (instances) of assignment data
6
Representing a form as a relation
CUSTOMER ORDER
Order Number:
Customer Number: Customer Name: Customer Address:
61384
1273
Computer Training Centre 123 Excellent St
Monash, Vic, 3000
Order Date:
12/3/2020
PRODUCT NUMBER
DESCRIPTION
QTY ORDERED
LINE PRICE
M128
Bookcase
4
800
B381
TV Cabinet
2
600
R210
Round Table
3
1500
ORDER ( orderno, orderdate, custnumb, custname, custaddress (prodno, proddesc, qtyordered, lineprice))
7
Functional Dependency Revisited
An attribute B is FUNCTIONALLY DEPENDENT on another attribute A, if a value of A determines a single value of B at any one time.
AB
PRODNO PRODDESC
CUSTNUMB CUSTNAME
ORDERNO ORDERDATE
ORDERNO independent variable, also known as the DETERMINANT
ORDERDATE dependent variable TOTALDEPENDENCY
attribute A determines B AND attribute B determines A
EMPLOYEE-NUMBER TAX-FILE-NUMBER
TAX-FILE-NUMBER EMPLOYEE-NUMBER
8
Functional Dependency
For a composite PRIMARY KEY, it is possible to have FULL or PARTIAL dependency.
FULLDEPENDENCY
occurs when an attribute is always dependent on all attributes in the
composite PK
ORDERNO, PRODNO QTYORDERED
Lack of full dependency for multiple attribute key = PARTIAL DEPENDENCY
ORDERNO, PRODNO
PRODDESC, QTYORDERED
here although qtyordered is fully dependent on orderno and prodno, only
prodno is required to determine proddesc
proddesc is said to be partially dependent on orderno and prodno
9
Functional Dependency
TRANSITIVE DEPENDENCY
occurs when Y depends on X, and Z depends on Y thus Z also
depends on X ie. X Y Z
and Y is not a candidate key (or part of a candidate key) ORDERNO CUSTNUMB CUSTNAME
Dependencies are depicted with the help of a Dependency Diagram.
Normalisation converts a relation into relations of progressively smaller number of attributes and tuples until an optimum level of decomposition is reached little or no data redundancy exists.
The output from normalisation is a set of relations that meet all conditions set in the relational model principles.
10
Unormalised Form (UNF)
The UNF representation of a relation is the representation which you have mapped from your inspection of the form
it is a single named representation (name is not pluralised)
no PK etc have as yet been identified
ASSIGNMENT (proj_num, emp_num, emp_name, job_class, chg_hour, assign_hours)
ORDER (orderno, orderdate, custnumb, custname, custaddress (prodno, proddesc, qtyordered,
lineprice))
Can ASSIGNMENT and/or ORDER be called a relation? If not, why not?
11
First Normal Form
FIRST NORMAL FORM (part of formal definition of a relation)
A RELATION IS IN FIRST NORMAL FORM (1NF) IF:
a unique primary key has been identified for each tuple/row.
it is a valid relation
Entity integrity (no part of PK is null)
Single value for each cell ie. no repeating group (multivalued attribute).
all attributes are functionally dependent on all or part of the primary key
12
UNF to 1NF
Move from UNF to 1NF by:
identify a unique identifier for the repeating
group.
remove any repeating group along with the PK of the main relation.
The PK of the new relation resulting from the removal of repeating group will normally have a composite PK made up of the PK of the main relation and the unique identifier chosen in 1. above, but this must be checked.
13
1NF to 2NF
A RELATION IS IN 2NF IF
all non key attributes are functionally dependent on
the primary key (simple definition)
used by the textbook in examples
all non key attributes are functionally dependent on any candidate key (general definition)
see textbook section 6-3, same as simple if only one candidate key
Requirement for our unit
16
2NF to 3NF
A RELATION IS IN 3NF IF
all transitive dependencies have been removed check for non key attribute dependent on another non key attribute
Move from 2NF to 3NF by removing transitive dependencies
24
Customer Order Form
CUSTOMER ORDER
Order Number:
Customer Number: Customer Name: Customer Address:
61384
1273
Computer Training Centre 123 Excellent St
Monash, Vic, 3000
Order Date:
12/3/2020
PRODUCT NUMBER
DESCRIPTION
QTY ORDERED
LINE PRICE
M128
Bookcase
4
800
B381
TV Cabinet
2
600
R210
Round Table
3
1500
ORDER ( orderno, orderdate, custnumb, custname, custaddress (prodno, proddesc, qtyordered, lineprice))
31
Assume a phone number may be shared between employees
34
Summary
Things to remember
Represent form as presented, no interpretation, to yield
starting point (UNF)
Functional dependency
Process of removing attributes in relations based on the concept of 1NF, 2NF and 3NF.
UNF to 1NF define PK & remove repeating group. 1NF to 2NF remove partial dependency.
2NF to 3NF remove transitive dependency.
40
Reviews
There are no reviews yet.