This assignment is the second of a mini project where you are demonstrating your understanding of the modules of this class.
The mini project is about building a small ETL program in python. Each pat of the project focuses on different portions / stages of the ETL process.
Assignment Background
Summarizing your acquired knowledge from module 5, we are going to focus on the L part of the ETL process. For reference, ETL stands for Extract, Transform, Load. We are going to focus here on Load.
Assignment Statement
- Save data in a database
- Read data from a database
You are required to supply Last Name_First Name_Project_Part2.py and Last Name_First Name_create_dbs.py. Please upload as a single zip file.
Requirements:
Lets create a place to store each dataset separately.
- Create a create_dbs.py file.
- In this file created in step 1, write python code using sqlite3 to:
- Create a baseball.db database
- Create 1 table named baseball_stats with the following columns:
- player_name
- games_played
- average
- salary
- Create a baseball.db database
The baseball.db SQLite3 database should look like this:
Write a Python script to store baseball and stock data in SQLitedatabases.
The baseball.db SQLite3 database file should have one table:
Baseball_stats ======================== player_name text games_played int average real salary real
The stocks.db SQLite3 database file should have one table:
stock_stats ======================== company_name text ticker text country text price real exchange_rate real shares_outstanding real net_income real market_value real pe_ratio real
Baseball_stats
========================
player_name text
games_played int
average real
salary real
- In this file created in step 1, write python code using sqlite3 to:
- Create a stocks.db database:
- Create 1 table named stock_stats
- company_name
- ticker
- exchange_country
- price
- exchange_rate
- share_outstanding
- net_income
- market_value_usd
- pe_ratio
The stocks.db SQLite3 database should look like this:
stock_stats
========================
company_name text
ticker text
country text
price real
exchange_rate real
shares_outstanding real
net_income real
market_value real
pe_ratio real
- Create AbstractDAO class. It should have the methods:
- it should have 1 (instance) member: db_name
- insert_records(records) Should raise the NotImplementedError
- select_all() Should raise the NotImplementedError
- connect():
- connect to the database identified by db_name
- returns the created connection
- Create BaseballStatsDAO class
- Class should inherit AbstractDAO
- Class should implement the methods listed:
- insert_records:
- takes a list of records as parameter ( BaseballStatsDAO takes BaseballStatRecord)
- call the method connect()
- using the returned connection, create a cursor.
- For each record in the list, write and execute an INSERT INTO statement to save the records information to the correct table.
- insert_records:
Example for baseball: cursor.execute(INSERT INTO baseball_stats VALUES ( ?, ? , ? , ?), (name, number_games_played, avg, salary))
- Commit the connection
- Close the connection
- select_all
- call the method connect()
- using the returned connection, create a cursor.
- create an empty deque to hold the records in memory
- write and execute a SELECT statement to get all the records of the table for the DAO
Example for baseball: cursor.execute(SELECT player_name, games_played, average, salary FROM baseball_stats;)
- For each row fetched, iterate with a for loop over the result of your select command
- Create a new record (BaseballStatRecord)
- Add the record to the deque
- Close the connection
- Return the deque
- Create StockStatsDAO class
- Class should inherit AbstractDAO
- Class should implement the methods listed:
- insert_records:
- takes a list of records as parameter (StockStatsDAO takes StockStatRecord)
- invokes the method connect()
- using the returned connection, create a cursor.
- For each record in the list, write and execute an INSERT INTO statement to save the records information to the correct table.
- insert_records:
Example for stocks: see insert statement for baseball stats
- Commit the connection
- Close the connection
- select_all
- invokes the method connect()
- using the returned connection, create a cursor.
- create an empty deque to hold the records in memory
- write and execute a SELECT statement to get all the records of the table for the DAO
Example for stocks: refer to select statement for Baseball
- For each row fetched, iterate with a for loop over the result of your execute:
- Create a new record (StockStatRecord)
- Add the record to the deque
- Close the connection
- Return the deque
Use the code written for Project Part 1. If needed, a partial solution can be provided by your facilitator. This will only be done on a case by case basis and only in extreme circumstances. A 10-point deduction will be made if a solution is needed.
This section loads the records into the correct database using the classes from Project Part1
- load MLB2008.csv using the BaseballCSVReader
- load StockValuations.csv using the StocksCSVReader
- Instantiate a new DAO instance for BaseballStats
- Instantiate a new DAO instance for StocksStats
- Insert the loaded records into baseball database using Baseball DAOs insert_records.
- Insert the loaded records into stocks database using Stocks DAOs insert_records.
Awesome! Now we have data in the database. Lets use it!
Stocks stats:
- Using the instance of StockStatsDAO select_all the records
- Calculate and print the number of tickers by exchange_country using a dictionary.
Baseball stats:
- Using the instance of BaseballStatsDAO select_all the records
- Compute the average salary and enter into a dictionary by batting average
- Print the dictionary formatting the salary to 2 decimal places.
NOTE: use round(record.avg, 3)) The rounding needs to happen at average time
Code/Comment Format
Good code includes well named variables that are consistent from the beginning to the end of the program. Naming of objects should be self-explanatory. For instance, iterator_for_noun_list is much better than i.
Every program consists of a sequence of paragraphs, each of which has objectives, and which builds on the previous paragraphs. We are mostly interested in objectives that are valid at the end of the program so we can verify the programs design. The following is a preferred form for such paragraph headings. The # sign is adequate when the comment is a single line.
#This is an in-line comment used to document the code for you, or anyone else, that intends
#To extend the code
In-line comments are helpful when one has to go back to the code 6 months later to make changes.
For doc strings, python allows the use of triple quotes. The triple quotes can be either single or double quotes. A doc sting is generally used as user documentation. It does not need to include details of the implementation of the program, but instead it provides documentation as how to use the API for the program (input, output etc.)
For example:
This is an example of a doc string
It allows multiple lines within the string.
This is an example of a doc string
It allows multiple lines within the string.
This becomes significant when using functions, classes etc. as the triple quotes help to self-document the parameters and return values of the function.
Sample Output: The output is a sample and is not complete
0.083 400,000.00
0.130 600,000.00
0.132 1,500,000.00
0.147 6,333,333.00
0.158 14,726,910.00
0.159 3,850,000.00
IE 1
CA 10
ZA 1
JP 37
NO 2
SE 7
Reviews
There are no reviews yet.