MONASH
INFORMATION TECHNOLOGY
The Relational Database Model
Overview
We now have a conceptual model for Monash Software, it is time to move to the second stage and map this to a logical model
For our unit this will involve mapping to the Relational Model in preparation for implementation in a RDBMS
Relational Model
RelationalAlgebra
2
The Relational Model
Introduced by CODD in 1970 the fundamental basis for the relational DBMS
Basic structure is the mathematical concept of a RELATION mapped to the
concept of a table (tabular representation of relation)
Relation abstract object
Table pictorial representation
Storage structure real thing eg. isam file of 1s and 0s
Relational Model Terminology
DOMAIN set of atomic (indivisible) values
specify
name
data type
data format Examples:
customer_number domain 5 character string of the form xxxdd
name domain 20 character string
address domain 30 character string containing street, town & postcode
credit_limit domain money in the range $1,000 to $99,999
3
A Relation
A relation consists of two parts heading
body
Relation Heading
Also called Relational Schema consists of a fixed set of attributes
R (A1,A2,.An)
R = relation name, Ai = attribute i
Each attribute corresponds to one underlying domain: Customer relation heading:
CUSTOMER (custno, custname, custadd, credlimit)
custno
custname
custadd
credlimit
4
Relation Body
Relation Body
Also called Relation Instance (state of the relation at any point in time)
r(R) = {t1, t2, t3, ., tm}
consists of a time-varying set of n-tuples
Relation R consists of tuples t1, t2, t3 .. tm
m = number of tuples = relation cardinality
each n-tuple is an ordered list of n values
t = < v1, v2, ….., vn>
n = number of values in tuple (no of attributes) = relation degree
In the tabular representation:
Relation heading column headings
Relation body set of data rows
custno
custname
custadd
credlimit
SMITH
Wide Rd, , 3
JONES
Narrow St, , 3
SMI13
JON44 BRO23
BROWN
Here Rd, , 3
2000
10000
10000
5
Relation Properties
No duplicate tuples
by definition sets do not contain duplicate elements
hence tuples must be unique Tuples are unordered within a relation
by definition sets are not ordered
hence tuples can only be accessed by content
No ordering of attributes within a tuple by definition sets are not ordered
6
Relation Properties contd
Tuple values are atomic cannot be divided
EMPLOYEE (eid, ename, departno, dependants)
not allowed: dependants (depname, depage) multivalued
hence no multivalued (repeating) attributes allowed, called the first normal form rule
COMPARE with tabular representation
normally nothing to prevent duplicate rows rows are ordered
columns are ordered
tables and relations are not the same thing
7
Q1. Which of the following statements is TRUE according the characteristics of the relational model?
All values in an attribute need to be from the same domain.
Each attribute needs to have a distinct name. The order of attributes and tuples matters.
Each intersection of a attribute and a tuple represent a single value.
More than one statement is TRUE
8
Functional Dependency
Functional Dependency:
A set of attributes A functionally determines an attribute B if, and only if,
for each A value, there is exactly one value of B in the relation. It is denoted as A B (A determines B, or B depends on A)
order_no order_date
prod_no prod_desc
order_no, prod_no qty_ordered
9
Relational Model Keys
A superkey of a relation R is an attribute or set of attributes which exhibits only the uniqueness property
No two tuples of R have the same value for the superkey (Uniqueness property)
t1[superkey] = t2[superkey]
A candidate key CK of a relation R is an attribute or set of attributes
which exhibits the following properties:
Uniqueness property (as above), and
No proper subset of CK has the uniqueness property
Many possible superkeys
(Minimality or Irreducibility property) ie. a minimal superkey
One candidate key is chosen to be the primary key (PK) of a relation. Remaining candidate keys are termed alternate keys (AK).
Only ONE primary key
(may be composed of
many attributes a
composite primary key) 10
Potentially many possible candidate keys
Q2. Given the following relation:
EMPLOYEE (empno, empname, empsalary, emptaxfileno) empno employee number
empname employee name
empsalary employee salary
emptaxfileno employee tax file number Possible superkey(s) are:
empno, empname, empsalary, emptaxfilenno empno
emptaxfileno, empname
empname
B and C
A, B and C
A, B, C and D
11
Q3. Given the following relation:
EMPLOYEE (empno, empname, empsalary, emptaxfileno) empno employee number
empname employee name
empsalary employee salary
emptaxfileno employee tax file number How many candidates keys exist:
0 1 2 3 4
12
Q4. Given the following relation:
EMPLOYEE (empno, empname, empsalary, emptaxfileno) empno employee number
empname employee name
empsalary employee salary
emptaxfileno employee tax file number How many primary keys exist:
0 1 2 3 4
13
Selection of a Primary key
A primary key must be chosen considering the data that may be added to the table in the future
Names, dates of birth etc are rarely unique and as such are not a good option
PK should be free of extra semantic meaning and security compliant, preferably a single attribute, preferably numeric (see Table 5.3 Coronel & Morris)
Natural vs Surrogate primary key
PATIENT_TREATMENT (patient_id, physician_id, treatment_code, pt_date, pt_time, pt_result)
Superkey
CK
PK
Issues with PK?
14
15
Null in the Relational Model
NULL is NOT a value is a representation of the fact that there is NO VALUE
Reasons for a NULL:
VALUE NOT APPLICABLE
EMP relation empno, deptno, salary, commission commission only applies to staff in sales dept
VALUE UNKNOWN
Joes salary is NULL, Joes salary is currently unknown
VALUE DOES NOT EXIST
Tax File Number is applicable to all employees but Joe may not
have a number at this time
VALUE UNDEFINED
Certain items explicitly undefined eg. divide by zero
Columns Number_of_payments, Total_payments ColumnAverage_payment_made
If Number_of_payments = 0 => Average undefined
16
Writing Relations
Relations may be represented using the following notation:
RELATION_NAME (attribute1, attribute2,)
The primary key is underlined.
Example:
STAFF (staffid, surname, initials, address, phone)
17
Relational Database
A relational database is a collection of normalised relations.
Normalisation is part of the design phase of the database and will be discussed in a later lecture.
Example relational database:
ORDER (order_id, orderdate,)
ORDER_LINE (order_id, product_id, quantity) PRODUCT (product_id, description, unit_price)
18
Foreign Key (FK)
FK: An attribute/s in a relation that exists in the same, or another relation as a Primary Key.
Referential Integrity
A Foreign Key value must either match the full
primary key in a relation or be NULL.
The pairing of PK and FK creates relationships (logical connections) between tables when implemented in a RDBMS. Hence the abstraction away from the underlying storage model.
19
MANAGER PROJECT
Q5. If the above two tables are to be created in a relational database, in which table would you assign the FK (and using which attribute) to create the logical link? For our supplied scenario a manager may manage many projects and a project can only be managed by one manager. A managers name (project_manager) and the project_code may be considered to be unique for this example:
MANAGER table using project_manager attribute. PROJECT table using project_code attribute. MANAGER table using manager_phone attribute. PROJECT table using project_manager attribute None of the above, a relationship is not needed.
20
Q6. Where are the foreign keys in these two relations? Note: a supervisor is a staff member
STAFF (staff_id, surname, initials, address, phone, dept_id, supervisor_id)
DEPARTMENT (dept_id, deptname)
dept_id in staff relation.
dept_id in department relation. staff_id in staff relation. supervisor_id in staff relation. More than one answer is correct
21
Data Integrity
Entity integrity
Primary key value must not be NULL.
No duplicate tuple property then ensures that each primary key must be unique
Referential integrity
The values of FK must either match a value of a full
PK in the related relation or be NULL. Column/Domain integrity
All values in a given column must come from the same domain (the same data type and range).
22
MANAGER
PROJECT
Q7. Suppose that the manager William K. Moor leaves the company and we delete his record from the manager table. Which of the following actions will satisfy the data integrity constraints?
The last row in PROJECT table must be deleted The PROJECT_MANAGER value in the last row of
PROJECT table must be set to NULL (empty) The PROJECT_MANAGER value in the last row of
PROJECT table must be set to any string (e.g., XYZ) X The options a and b
All of the above
23
Relational DMLs
Relational Calculus
Relational Algebra
Transform Oriented Languages (e.g. SQL)
Graphical Languages
Exhibit the closure property queries on relations produce relations
24
Relational Calculus
Based on mathematical logic.
Non-procedural.
Primarily of theoretical importance.
May be used as a yardstick for measuring the power of other relational languages (relational completeness).
Operators may be applied to any number of relations.
25
26
Relational Algebra
Relationally complete.
Procedural.
Operators only apply to at most two relations at a time. 8 basic operations:
single relation: selection, projection
cartesian product, join
union
intersection
difference
division
27
28
29
2
1
30
SQL vs Relational Algebra in the Database
31
Q8. Which of the following statements returns the student ids of the students who got more than 70 marks in the subject
1011.
A. mark > 70 ( stude (MARK))
B. mark > 70 (MARK)
C. mark > 70 AND subj = 1011 ( stude (MARK))
D. mark > 70 AND subj = 1011 (MARK)
E. stude ( mark > 70 AND subj = 1011 (MARK))
32
JOIN
Join operator used to combine data from two or more relations, based on a common attribute or attributes.
Different types: theta-join
equi-join
natural join
outer join
33
THETA JOIN (Generalised join)
(Relation_1) F (Relation_2)
F is a predicate (i.e. truth-valued function) which is
of the form Relation_1.ai Relation2.bi
CUSTOMER.cust_no ORDER.cust_no
is one of the standard arithmetic comparison operators, i.e. <, , =, , >
Most commonly, is equals (=), but can be any of the operators
EMPLOYEE.emp_sal > SALARYSCALE.step_5
34
Q9. How many rows are generated when the product (Cartesian Product) of the STUDENT and MARK relations is taken? i.e. the number of rows in STUDENT X MARK.
24
6
18
7
none of the above
35
Q10. How many columns are generated when the product (Cartesian Product) of the STUDENT and MARK relations is taken? i.e. the number of columns in STUDENT X MARK.
9
6
5
7
none of the above
36
NATURAL JOIN
STUDENT MARK
ID Name ID Subj Marks 1Alice 1100495 2Bob 2104555
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =)
1 1045 90
STUDENT. Name ID
1 Alice 1 Alice
1 Alice
2 Bob
2 Bob 2 Bob
MARK.ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
1 1004 95
2 1045 55
1 1045 90
37
NATURAL JOIN
STUDENT MARK
ID Name ID Subj Marks 1Alice 1100495 2Bob 2104555
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =) Step 3: delete duplicate columns (project away)
1
1045 90
STUDENT.I Name D
MARK.ID Subj Marks
1 1004 95 1 1045 90
1 Alice
1 Alice
2 Bob 2 1045 55
38
NATURAL JOIN
STUDENT MARK
ID Name ID Subj Marks 1 Alice 1 100495 2Bob 2104555
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =) Step 3: delete duplicate columns (project away)
1
1045 90
Marks 95
90
55
ID 1 1 2
Name Subj Alice 1004 Alice 1045 Bob 1045
A natural join of STUDENT and MARK
39
Q11. Which of the following statements returns a natural join of the two relations on the student ids (sid and stude)?
sid = stude (STUDENT X MARK)
course, name, sid, subj, mark ( sid = stude (STUDENT X MARK))
sid = stude ( course, name, sid, subj, mark (STUDENT X MARK)) All of the above
None of the above
40
OUTER JOIN
MARK
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
4 1004 100
STUDENT
ID Name
1 Alice
2 Bob
3 Chris
No information for Chris (no mark, e.g. just enrolled) and the student with ID 4 (no student, e.g. quit uni)
ID 1 1 2
Name Subj Marks Alice 1004 95 Alice 1045 90 Bob 1045 55
A natural join of STUDENT and MARK
41
FULL OUTER JOIN
STUDENT
ID 1 2 3
Name Alice Bob Chris
MARK
ID Subj Marks 1 1004 95
2 1045 55
1 1045 90
4 1004 100
Get (incomplete) information of both Chris and student with ID 4
ID Name 1 Alice
1 Alice
2 Bob
3 Chris
4
Subj Marks 1004 95 1045 90 1045 55
1004 100
Null Null
Null
A full outer join of STUDENT and MARK
42
LEFT OUTER JOIN
MARK
ID Subj Marks 1 1004 95
2 1045 55
1 1045 90
4 1004 100
STUDENT
ID Name 1 Alice 2 Bob
3 Chris
Get (incomplete) information of only Chris
ID Name 1 Alice 1 Alice 2 Bob
3 Chris
Subj Marks 1004 95 1045 90 1045 55
Null Null
A left outer join of STUDENT and MARK
Memory aid: Chris is on the LEFT of the nulls.
43
RIGHT OUTER JOIN
MARK
ID Subj Marks
1 1004 95 2 104555 1 1045 90
4 1004 100
STUDENT
ID Name 1 Alice 2 Bob
3 Chris
Get (incomplete) information of the student with ID 4
ID 1 1 2 4
Name Alice Alice Bob
Subj Marks 1004 95 1045 90 1045 55 1004 100
Null
A right outer join of STUDENT and MARK.
Memory aid: the marks data is on the RIGHT of the null.
44
45
Q12. Consider the above relations.
Assume that we want to join the them to obtain the information of all students (Anne, Dave, John and Tim). Which of the following is WRONG? (Hint: Dave just enrolled!)
A. Left outer join on STUDENT and MARK
B. Right outer join on MARK and STUDENT
C. Right outer join on STUDENT and MARK
D. Full outer join on STUDENT and MARK
E. Select if (B and C are wrong)
F. Select if (B, C and D are wrong)
Reviews
There are no reviews yet.