CS5481 Data Engineering Assignment 2
Deadline: 25OCT2019 Friday, noon no late submission will be accepted
Points to note:
Different books may have slightly different descriptions of concepts, algorithms and terminologies. To ensure fair assessment and uniformity in marking, you must follow the convention used in the lecture slides or our textbook Database System Concepts. Other conventions will not be accepted.
Students are expected to generalize the concepts they have learnt during the lecture in order to finish the assignment.
You must show the steps clearly. The marker will not give you marks if he cannot understand your work.
This is an individual assignment. You must work on your own. Check http:www6.cityu.edu.hkahplagiarism.htm for The Problem of Plagiarism.
Submit the file to Canvas on or before the deadline.
The file type must be either .docx file or .pdf file.
Use your student IDs to name the file, such as 5xxxxxxx.docx or 5xxxxxxx.pdf.
Query Processing and Optimization
Consider the following database schema. Category categoryno, categoryname, Reader readerid, gender, age, gradeno, Book bookid, cno,
Loanrecord rid, bid, loandate,
The attributes rid and bid in Loanrecord are foreign keys referencing readerid in Reader and bookid in Book, respectively. The attribute cno in Book is a foreign key referencing categoryno in Category.
Suppose ONLY the following information and statistics are available and assume that all distributions are uniform and independent of each other.
Number of categories nCategory: 280
Number of readers nReader: 50,000
Number of books nBook: 4,500
Number of loan records nLoanrecord: 60,000
Blocking factor of Category fCategory: 8
Blocking factor of Reader fReader: 12
Blocking factor of Book fBook: 9
Blocking factor of Loanrecord fLoanrecord: 10
Proportion of category names containing science: 17
Number of female readers: 24,950
Range of age: from 12 to 72
Valuesofgradeno:1,2,3,4,5,6,7,8,9,10
Category, Reader and Book are sequential files in sorted order of their primary key
and Loanrecord is a sequential file in sorted order of loandate. Records do not span across blocks.
CS5481 Data Engineering Assignment 2
Consider the following SQL query that retrieves all information about female readers whose age is less than 30 with grade number above 6 and have loan records of any nonscience related books.
select
from Category c, Reader r, Book b, Loanrecord l where c.categoryname not like science
and c.categorynob.cno and b.bookidl.bid and r.readeridl.rid and r.genderfemale and r.age30 and r.gradeno6;
a Write an unoptimized relationalalgebra expression of the above SQL query that might be initially generated from an SQL translator.
b Estimate the number of records the query returns. Explain your estimation.
c Assume that only 30 memory blocks are available and consider the cost of disk block transfers only. If leftdeep join orders are considered, suggest the best evaluation plan for the query. Particularly, you need to do the followings.
i Transform the unoptimized relationalalgebra expression in Part a it into an equivalent optimized relationalalgebra expression for producing your suggested evaluation plan. Show all the steps and state clearly the rule number of the equivalence rule you used in each step.
ii Draw the fully annotated evaluation plan including exactly what algorithm is used for each operation and how the execution of the operations is coordinated.
iii Indicate in the evaluation plan the memory allocation number of memory blocks allocated to each input and output bb, but the total cannot exceed the available memory at one time.
iv Find the cost of each operation and the total cost of the whole plan in number of disk block transfers.
v State clearly any justifiable assumptions you have made in your estimation.
d If any join orders can be considered, is it possible to have a better evaluation plan for the query? If yes, you need to follow the steps in Part c to explain your answer.
e If a Btree index can be created on one attribute for one of the base relations, is it possible to have a better evaluation plan for the query? If yes, which attribute and which relation will you choose? Again, if yes, you need to follow the steps in Part c to explain your answer.
Reviews
There are no reviews yet.