Overview of the Assignment:
In this exercise, we will review database scaling and performance tuning concepts. This exercise is based on the DVD rental company database you have been exploring in the programming assignments. The assignment is broken into two parts.
Required Software
This assignment requires you to run the DVD rental database which you have used in the previous programming assignment.
You will need Notepad ++, Visual Studio Code or a similar code compare tool
Notepad++ download (windows only) https://notepad-plus-plus.org/downloads/
Visual Studio Code download: https://code.visualstudio.com/
How to use Compare plugin in Notepad++
Install the compare Plugin if you dont have it set up.
In Notepad++ go to Plugins menu at the top and select Plugins Admin
In the list of Plugins, scroll down to Compare check it off and select Install, and follow prompts.
Once Notepad++ has the compare plugin installed, use the file menu to open both files in separate windows.
Under Plugins menu select Compare, the two files will show up, side by side, and you will notice the lines which are different are highlighted.
How to use compare in Visual Studio Code
Open both files in Visual Studio Code
In the Explorer window (within Visual Studio Code) right click on one of the files and select Select for Compare
In the same Explorer window right click on the other file and select Compare with Selectd
You will now see both files side by side with differences highlighted.
Preparing for the Assignment
Review normalization and database tuning concepts as outlined in the weekly module notes and the text.Note that the assignment has two parts.
Review the DVD rental company schema provided.
Please Use the Assignment Template to provide your answers.
Assignment 2.0, Part A: Data Load
This assignment requires you to load additional data into the DVD rental database which you have used in the previous programming assignment.Steps below will introduce you to some of the data load challenges and skills which are used within data management.Please review very carefully.
Outline of the process:
Inspect the DDL files and any changes you made in the original schema in programming assignment part 1
Create and run a DDL script to make adjustments to your database.
Load additional data into your database.
Step 1 Inspect the DDL
Download the tuningSampleData file provided for your DBMS.In it you will find directory for your DBMS (Oracle, SQL server and PostgreSQL).
Within each directory you will find the following files:
Original DLL files for the Netflix DB:
Create Table (i.e. NetFlix_Oracle_Create_Tables_Original.sql)
Crete Indexes (i.e. NetFlix_Oracle_Create_Indexes_Original.sql)
Updated DLL files for the Netflix DB:
Create Table (i.e. NetFlix_Oracle_Create_Tables_Updated.sql)
Crete Indexes (i.e. NetFlix_Oracle_Create_Indexes_Updated.sql)
DML files to load the data:
Original inserts file (i.e. NetFlix_Oracle_Inserts.sql): this file is provided as a reference, you will not need to run it
For the files listed below, the GUI tools may not be able to open the files because of their size, directions on how to load these files will be in step #3
dvd_script.sql: contains additional DVD table data
movieperson_script.sql : contains additional movie person table data
movieperson_role_script.sql: contains additional movie person role table data
dvd_quantity_script.sql: script to update quantities within the DVD table
member_script: contains new member data
rental_queue_script.sql: contains additional records for rental queue table
rental_script.sql: contains rental data.Please note that this script specifically will have almost 500,000 new records and may take up-to 45 minutes to insert depending on your DBMS environment.
Your first step is to Inspect and compare the DDL in the original and updated create table and create index scripts.There are some very minor changes we want you to notice.Use the Notepad++, Visual Studio Code or similar tool to compare the files to identify the changes (see directions on how to do this on page 1)
In addition, you will need to review the DDL changes you have made in Programming Assignment Part 1 such as how you modified the Rental table, and decide what adjustments need to be made.The insert scripts are meant to run on the DDL that is provided, and any changes you might not consider may cause the scripts not to load the data correctly.
Document what DDL changes you need to make and provide the SQL and the screenshots showing the changes being made.
Loading the test data
You will need to place the files into a specific folder on your computer and reference the SQL script to run these files.
Oracle:
To load the data into Oracle, run the following commands. It is assumed that the files have been placed in C:Netflix folder, you can replace the C:Netflix with the location where you placed the files.
Tip: execute each line at a time to review that it ran without issues.The rental script may take over 40 minutes to run.
@C:Netflixdvd_script.sql
@C:Netflixmovieperson_script.sql
@C:Netflixmovieperson_role_script.sql
@C:Netflixdvd_quantity_script.sql
@C:Netflixmember_script.sql
@C:Netflixrental_queue_script.sql
@C:Netflixrental_script.sql
SQL Server:
To load the data into SQL Server, you will need to turn onSQLCMD mode where you can execute a SQLCMD script. To enable this mode clickQueryin menu bar then selectSQLCMD Mode.You can now run the following commands below. It is assumed that the files have been placed in C:Netflix folder, you can replace the C:Netflix with the location where you placed the files.
:setvar path C:Netflix
:r $(path)dvd_script.sql
:setvar path C:Netflix
:r $(path)movieperson_script.sql
:setvar path C:Netflix
:r $(path)movieperson_role_script.sql
:setvar path C:Netflix
:r $(path)dvd_quantity_script.sql
:setvar path C:Netflix
:r $(path)member_script.sql
:setvar path C:Netflix
:r $(path)rental_queue_script.sql
:setvar path C:Netflix
:r $(path)rental_script.sql
Document the record count from the loaded tables by providing the screenshot.
If you get stuck and need to start over:
You will need to run the following scripts to clear the data, then insert then insert data from the original inserts into these specific tables, and then run the load again.If using Oracle or PostgreSQL, make sure to commit after deleting the data and before you reinsert it as well as after the original inserts and before loading the new test data.
First delete data from these tables:
delete from Rental;
delete from MoviePersonRole;
delete from RentalQueue;
delete from MoviePerson;
delete from DVD;
Now insert data from the original inserts file (NetFlix_Oracle_Inserts.sql) for these tables, then repeat step 3 above.
Assignment 2.0, Part B Questions
Suppose that the small firm for which you designed the DVD rental database has established an internet presence and their sales are growing rapidly. You have been asked to examine the database to make sure that it can grow to support at least a million transactions per day.Your assignment is to examine the DVD rental schema provided and identify performance and other scaling problems. The frequently occurring operational transactions are:
Millions of customers updating their wish list (DVD rental queue) daily.
The receipt of a DVD, with the corresponding mailing of the next in-stock DVD from the customers wish list (the rental records).
Monthly billing of each customers credit card.
The scaling issues result from customers adding data to the rental queue, the rental and the payments table, while staff frequently query from these tables for business-driven decisions.Performance degradation has been observed both from customer point of view when renting DVDs, adding Movies to the queue and making payments, and staff quiring the data.
Write a compact textual description (approximately two thirds of a page single spaced) to discuss the following topics:
What changes would you make to the provided schema so that it can scale to handle a million or more transactions per day? Identify the scalability-limiting schema features and propose changes that would make them scalable.
Would you denormalize? If you would denormalize, how would you maintain the denormalizations? Justify your design changes regardless of your decision about whether to denormalize.
Please provide parts of the ERD design changes you are proposing with your suggested changes and paste it below.
Student solution to Part B, Question 1:
A key part of growing the DVD rental business has been targeted email marketing to members. One of the suggestions to improve performance of marketing analysis has been to implement a RentalHistory table. A major concern is that as the rental history table grows, these queries may take longer and consume increasing resources. Our client is aware that as they continue to grow, they will need a data warehouse to support these marketing analyses. They do not want to undertake the effort at this time, the client does want to improve the performance of the marketing queries against the existing OLTP schema. In a later assignment you will design a dimensional data warehouse for our DVD rental business client, so we are not asking for that solution now.
Within about half a page (single spaced), discuss the following topics:
How would you design a denormalized RentalHistory table to reduce the number of tables that need to be joined?
How would you maintain this data, meaning how and when would it be populated and updated? Compare and contrast some different approaches (think stored procedures vs. triggers).
Provide an ERD design for the RentalHistory table with your suggestions and the explanations of your changes including an ERD of the RentalHistory table below.
Student solution to Part B, Question 2:
Assignment 2.0, Part C Questions
For the following questions in this exercise, assume that your DVD rental client is finding that the following queries are becoming problematically slow as the database grows in size the additional sample data which you loaded will give you some context.
For each query, analyze why it may be slow, and propose ways to improve query performance either by changing the query; adding, deleting or modifying indexes, and or suggesting design changes.
You can implement the query and check its execution plan before and after your changes are made.Look to rewrite the query and explain why you made the change, then showing it in action via screenshots, as well as any additional suggestions you may have. It is important to explain why you are proposing the change.
Please note that these concepts are not exclusive in order to solve these questions:
Understanding differences between joining large data sets to small data sets either through subqueries or joins
Filtering on NULLS
Understanding NOT IN, NOT EXISTS, DISTINCT, searching on NULLs and different data types (e.g. search on a string vs. numeric type attributes)
Indexing concepts (different kind of indexes and where these might come into play)
Reducing number of similar operations
Design changes such as denormalization and or fragmentation (be careful about redundancy), and primary/foreign-key maintenance.
Stored procedures, triggers, views, materialized/indexed views
Use Explain Plan within the DBMS which can show the improvements and may provide hints on where the bottlenecks may be.
Use Explain Plan (Oracle)
Display Estimated Execution Plan (SQL Server)
Explain Analyze (PostgreSQL)
Below query is used to identify DVDs that are similar to the DVDs that the member has rented, in this case based on the director of the movie.
SELECT DVD.DVDTitle AS DVD Title,
SUBSTR(MoviePerson.PersonFirstName,1,10) || ||
SUBSTR(MoviePerson.PersonLastName,1,10) AS Director Name
FROM DVD
JOIN MoviePersonRole ON DVD.DVDId = MoviePersonRole.DVDId
JOIN Role ON MoviePersonRole.RoleId = Role.RoleId
JOIN MoviePerson ON MoviePersonRole.PersonId = MoviePerson.PersonId
WHERE
Role.RoleName = Director
AND ((MoviePerson.PersonLastName = Spielberg)
OR
(MoviePerson.PersonLastName = Hitchcock AND MoviePerson.PersonFirstName = Alfred))
ORDER BY MoviePerson.PersonLastName, MoviePerson.PersonFirstName;
What performance issues do you see with this query?Examine the execution plan to confirm your concerns.Are you seeing something within the execution plan which you did not consider?
Your recommendation to improve performance:
Queries like the following are run frequently to identify DVDs that a member has not rented.Note that the MemberID is passed from the application layer and may be different each time the query is run.
SELECT DVDTitle AS DVD Title
FROM DVD
WHERE DVDId NOT IN
(SELECT DISTINCT DVDId FROM Rental WHERE MemberId = 123);
What performance issues do you see with this query?Examine the execution plan to confirm your concerns.Are you seeing something within the execution plan which you did not consider?
Your recommendation to improve performance:
The following view assembles the data for a DVD in a form useful for output to a user interface, so it is used extensively by the application.
CREATE VIEW DVDView AS
SELECT DVDId, DVDTitle, Genre.GenreName AS Genre,
Rating.RatingName AS Rating
FROM DVD
JOIN Genre ON DVD.GenreId = Genre.GenreId
JOIN Rating ON DVD.RatingId = Rating.RatingId;
Queries against this view take forms, such as:
SELECT * FROM DVDView WHERE Genre = Horror AND Rating = R;
What performance issues do you see with this query?Examine the execution plan to confirm your concerns.Are you seeing something within the execution plan which you did not consider?
Your recommendation to improve performance:
Review your solution to question 7 from Programming Assignment Part 1.Customer is reporting that it is running slow.Recall the question asked to list all the Members and the DVD Copies they have currently rented and not returned using a subquery.A currently Rented DVD is where the RentalReturnedDate is NULL.The attributes returned were Members name, DVDTitle, Genre, Rating, a director if one exists in the database, DVD Copy, and the request and ship dates for each DVD.The Customer just cares about which movies have been rented and not returned at minimum.
What performance issues do you see with the query you wrote?Examine the execution plan to confirm your concerns.Are you seeing something within the execution plan which you did not consider?
Your recommendation to improve performance:
Review your solution to question 9 from Programming Assignment Part 1.Customer is reporting that it is running slow.Recall that the question asked for a query to list the DVD titles and how many times each one has been rented. The result should be DVD Title, Genre, Rating and number of rentals for each DVD.
What performance issues do you see with this query?Examine the execution plan to confirm your concerns.Are you seeing something within the execution plan which you did not consider?
Your recommendation to improve performance:
There are many ways to improve the scalable performance of the following stored procedure, which is running increasingly slowly as the database grows. Identify the changes to improve the scalable performance of this stored procedure.
Oracle Version:
This stored procedure is invoked when we have a lost DVD event. When given RentalId, MemberId, Amount charged (currently $25) and transaction date, this procedure will:
1. Enter a charge in the Payment table
2. Update the Rental table
3. Update the DVD table
CREATE OR REPLACE PROCEDURE Lost_DVD P prefix denotes parameters
(P_RentalId IN NUMBER, The rental for which DVD was lost
P_MemberId IN NUMBER, The member
P_Amount IN NUMBER, The amount to charge
P_Trans_Date IN DATE) The transaction date and time
AS The V prefix denotes variables.
V_DVDId NUMBER := 0.0;
V_PaymentId NUMBER := 0;
V_UntilDate DATE := NULL;
BEGIN
Get DVDId
SELECT DVDId into V_DVDId From RENTAL WHERE RentalId = P_RentalId;
Get PaymentId
SELECT COUNT(*)+1 into V_PaymentId FROM Payment;
Get AmountPaidUntilDate
SELECT AmountPaidUntilDate INTO V_UntilDate FROM Payment
WHERE MemberId = P_MemberId
AND PaymentId =
(SELECT MAX(PaymentId) FROM Payment WHERE MemberId=P_MemberId);
Now, make the changes to the database
INSERT INTO Payment(PaymentId, MemberId, AmountPaid,
AmountPaidDate,AmountPaidUntilDate)
VALUES(V_PaymentId,P_MemberId,P_Amount,P_Trans_Date,V_UntilDate);
UPDATE Rental SET RentalReturnedDate = P_Trans_Date
WHERE RentalId = P_RentalId;
UPDATE DVD SET DVDQuantityOnRent = DVDQuantityOnRent 1
WHERE DVDId = V_DVDId;
UPDATE DVD SET DVDLostQuantity = DVDLostQuantity + 1
WHERE DVDId = V_DVDId;
END Lost_DVD;
SQL Server Version:
This stored procedure is invoked when we have a lost-DVD event. When given RentalId, MemberId, Amount charged (currently $25) and transaction date; this procedure will:
1. Enter a charge in the Payment table
2. Update the Rental table
3. Update the DVD table
CREATE PROCEDURE Lost_DVD
@P prefix denotes parameters
@P_RentalId NUMERIC, The rental for which DVD was lost
@P_MemberId NUMERIC, The member
@P_Amount NUMERIC, The amount to charge
@P_Trans_Date DATE The transaction date and time
AS
BEGIN
The @V prefix denotes variables.
declare @V_DVDId NUMERIC= 0.0;
declare @V_PaymentId NUMERIC = 0;
declare @V_UntilDate DATE = NULL;
Get DVDId
SELECT DVDId INTO V_DVDID FROM RENTAL WHERE RentalId = @P_RentalId;
Get PaymentId
set @V_PaymentId= (SELECT COUNT(*)+1 FROM Payment);
Get AmountPaidUntilDate
SELECT AmountPaidUntilDate INTO V_UntilDate FROM Payment
WHERE MemberId = @P_MemberId
AND PaymentId =
(SELECT MAX(PaymentId) FROM Payment WHERE MemberId=@P_MemberId);
Now, make the changes to the database
INSERT INTO Payment(PaymentId, MemberId, AmountPaid,
AmountPaidDate,AmountPaidUntilDate)
VALUES(@V_PaymentId,@P_MemberId,@P_Amount,@P_Trans_Date,@V_UntilDate);
UPDATE Rental SET RentalReturnedDate = @P_Trans_Date
WHERE RentalId = @P_RentalId;
UPDATE DVD SET DVDQuantityOnRent = DVDQuantityOnRent 1
WHERE DVDId = @V_DVDId;
UPDATE DVD SET DVDLostQuantity = DVDLostQuantity + 1
WHERE DVDId = @V_DVDId;
END;
What performance issues do you see with this stored procedure:
Your recommendation to improve performance:
Contact your facilitator if you have any questions about how to approach this assignment.
Contact your facilitator if you have any questions about how to approach this assignment.
Reviews
There are no reviews yet.