5/5 - (1 vote)

1. Introduction

In this homework you need to practice some basic usages of MySQL, including creating databases, creating tables, loading csv files, loading SQL files, and using MySQL command to find the answer of tasks. After this homework, you will be capable of querying and analyzing your data by MySQL from zero to one.

There will be two datasets for this homework. The first one is COVID-19 data in South Korea, and the second one is European Soccer Dataset, both of these are downloaded from Kaggle Dataset (feel free to google them).

For the first dataset, you need to create a database based on our setting, and load the csv files into your created database. There are 6 easier questions you need to solve by SQL. For the second dataset, you need to load our sql file directly. There are 6 advanced problems you will meet. Read the following content for more details.

2. Tasks Part 1 Data Science for COVID-19 in South Korea

In this part, you need to create tables based on the provided DB schema, and load csv files into the database.

A. Create Tables

First, download the COVID-19 data from here.

You can refer to this page to see the meaning of the columns.

Then you should create tables based on the following setting. Notice that you must make the detail of your tables the same as our description, including

table name, attribute name, attribute type, primary key, foreign key, null.

Please paste the screenshot of your tables by using the `describe` command to your report, it will take 5% of your grades in this homework.

Table Name	Attribute Name	Type	Primary Key	Foreign Key	NULL
patient_info	patient_id	varchar(10)	YES		NO
	sex	varchar(10)
	age	int

	province	varchar(20)
	city	varchar(20)
	infection_case	varchar(100)
search_trend	date	date	YES		NO
	cold	float
	flu	float
	pneumonia	float
	coronavirus	float
time	date	date	YES		NO
	test	int
	negative	int
	confirmed	int
	released	int
	deceased	int
time_age	date	date	YES		NO
	age	int	YES		NO
	confirmed	int
	deceased	int
time_gender	date	date	YES		NO
	sex	varchar(10)	YES		NO
	confirmed	int
	deceased	int
time_province	date	date	YES		NO
	province	varchar(20)	YES		NO
	confirmed	int
	released	int
	deceased	int
region	code	int	YES		NO
	province	varchar(20)
	city	varchar(20)
	elementary_school_co unt	int
	kindergarten_count	int
	university_count	int
	elderly_population_rat io	float
	elderly_alone_ratio	float
	nursing_home_count	int
weather	code	int	YES	region (code)	NO
	date	date	YES		NO
	avg_temp	float
	most_wind_direction	int
	avg_relative_humidity	float

Please answer the following question in your report, it will take 10% of this homework.

(3%) What is the difference between type char and type varchar?
(3%) How many bytes it should take for tinyint, smallint, mediumint,

int? (e.g. 8 bytes for bigint)

And whats the range they can express? (e.g. from -1000 to 1000) tinyint, smallint, mediumint, int

(e.g. 8 bytes for bigint)

(e.g. from -1000 to 1000)

(4%) What do you think about this DB schema? If you can change this table architecture, how would you modify it and why?

B. Load CSV Data

After creating the database, you need to load the downloaded csv files into your database.

Here we dont restrict the method you use, but you have to check the data is loaded successfully by yourself. The following number is the data records for each table.

Table Name	# of Data Records
patient_info	5164
search_trend	1642
time	163
time_age	1089
time_gender	242
time_province	2771
region	243
weather	26271

C. Query Tasks

In this part, here are 6 query tasks you need to write. Please read the following rules carefully.

You are only allowed to use one query (one delimiter) to find the answer, and you dont have to explain your SQL. Noted that the column names of your query answers should be the same as our examples.

For homework submission, please write every query task into a single `sql` file, named as 1.sql, 2.sql, etc.

(5%) How many days have the word cold been searched over 2000 times in one day? cold

cnt

5566

(5%) How many ways are there for those men under 30, living in Seoul Gangnam-gu being infected? List out in the alphabetical order as the following example.

infection_case

eating

talking

sleeping

(5%) Find out the province, city, and the elementary school count in which the elementary school count is the top three most and the name of the province is different from the city. List out in decreasing order of the count.

province	city	cnt
Apple	Taipei	5566
Banana	Hsinchu	3344
Cherry	Taichung	1122

(5%) Find out the provinces whose days count of the average relative humidity larger than 70 are the top three most in May of 2016, and list their days count in decreasing order.

(5%) Find out the province where the elderly population ratio is larger than the average and the date with maximum confirmed in one day. List out in the order of date increasingly. Notice that the average of the elderly population ratio is the average of those provinces which have the same name as the city.

province	date
Apple	2020-01-01
Banana	2020-02-02

(5%) How many accumulated-confirmed, added-confirmed,

accumulated-dead, added-dead are there while the search number of the word coronavirus is larger than two standard deviations? List your answer in ascending order by date and round coronavirus to second decimal place. The standard deviation should be calculated by the period from 2019-12-15 to 2020-06-29.

The added-confirmed and the added-dead should be calculated by the accumulated count of that day minus the accumulated count of the previous day. coronavirus

date	coronavirus	confirmed_accumulate	confirmed_add	dead_accumulate	dead_add
2020-01-01	66.00	3	0	12	0
2020-01-02	55.12	23	20	34	22

Part 2 European Soccer Database

In this Part, instead of creating tables and loading csv files, you need to load the provided DB file directly. Please download the file from here.

D. Load SQL File

Here we provide simple steps for the Linux environment. You can also load the SQL file by other methods, like execute the sql file using the source command.

Firstly, create a database
Then, back to your shell and enter command mysql -u {user_name} -p {DB Name} < hw1_part2.sql

(You can google IO Redirection for more detail of the above mechanism)

E. Query Tasks

In this part, you are also only allowed to use one query (one delimiter) to find the answer, and you dont have to explain your SQL, except task 11 and task 12. Noted that the column names of your query answers should be the same as our examples.

For task 11 and 12, take screenshots of your queries, and write your analysis into the report. Try to explain whatre your queries doing, why you write these queries, whats the meaning of the result, whats your conclusion, etc.

For submission, also write every query task into a single `.sql` file, named as 7.sql,

8.sql , 11.sql, 12.sql

For the meaning of each table and column in part2, please refer to this page.

(10%) List the average long_shots score(round to the second decimal place) of the players who had participated in the Italy Serie A league during 2015/2016 season with respect to the preferred foot. You should calculate the long_shots score by the newest data of each player.

preferred_foot	avg_long_shots
left	30.87
right	20.87

(10%) During the 2015/2016 season, for each of the leagues, if we have known that the average height of members in one team is over 180, what is the probability that the team can win?

The numerator is the winning count of those over-180-teams, and the denominator is the count of those over-180-teams. Round the probability to fourth decimal place. List out in the alphabetical order.

(e.g. In two matches, A and B, one team of A is over-180-teams and wins the game. Both team of B are over-180-teams and one of the team win the game, then the win probability is )

(e.g. In two matches, A and B, one team of A is over-180-teams and ties the game. Both team of B are not over-180-teams, then the win probability is 0) (You need to calculate the record of the same team in different matches. For example, team a1 and team a2 take match A, and team a1 and team b1 take match

B, then you need to consider a1 multiple times)

name	prob
AppleLeague	0.5566
BananaLeague	0.3344

(10%) The win point can be calculated by the following rule: For each match, the winning team will get two points. The loser will get zero point. If the match is a draw, both of the teams will get one point. The win points of each team is the point divided by the match count the team participating in during the whole season. The top five teams with the highest win points are called the greats of the season. Find out the average winning score (round to the second decimal place) and the teams long name of the greats of the season during the 2015/2016 season. List out in the decreasing order of their win points.

team_long_name	avg_win_score
AppleTeam	5.55
BananaTeam	4.44
CherryTeam	3.33
JellyTeam	2.22
OreoTeam	1.11

(15%) We call it a landslide victory if there is a larger than or equal to five score gap between two teams in a match. And we call it an upset if any one of the sports betting companies has a higher betting odds on a team with landslide victory. Find out that for an upset, what is the average age of the player at that time and the average rating of the players of each team from the previous six months? Round the score to second decimal place and list out in the order of

match id increasingly.

(If a player has scores multiple times, please average all of them)

(Here we want to see the average data of players on the home side and away side in the upset matches. You dont need to consider which team the team belongs to)

id	home_player_avg_age	away_player_avg_age	home_player_avg_rating	away_player_avg_rating
1	22.22	23.23	60.60	58.58
2	23.23	24.24	62.62	75.75
3	24.24	21.21	55.55	99.99

(10%) Do the home team with home advantage has much more opportunity to win the game, or the team with a higher average score(which can be one of the overall_rating, dribbling, strength, interceptions, or the average of the four scores, calculated by the latest attribute before the player participating in the match) of the whole team players? Answer by your own view with one SQL query.

(10%) You are a gambler of sport lottery. Analyzing this dataset with SQL and finding out the better way to place a bet. You can answer by your own view with multiple SQL queries. Focus on observation to the dataset and explain your analysis.

4. Discussion

TAs had opened a channel HW1 on New E3 forum of the course, you can post questions about the homework on the forum. TAs will answer questions as soon as possible.

Discussion rules:

Do not ask for the answer to the homework.
Check if someone has asked the same question before asking.
We encourage you to answer other students questions, but again, do not give the answer of the homework. Reply the messages to answer questions.
Since we have this discussion forum, do not send email to ask questions about the homework unless the questions are personal and you do not want to ask publicly.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] DBMS-Introduction to Database Systems Individual Homework 1- SQL tasks in MySQL

1. Introduction

2. Tasks Part 1 Data Science for COVID-19 in South Korea

A. Create Tables

B. Load CSV Data

C. Query Tasks

Part 2 European Soccer Database

D. Load SQL File

E. Query Tasks

4. Discussion

Reviews

Whatsapp Us

[Solved] DBMS-Introduction to Database Systems Individual Homework 1- SQL tasks in MySQL

1. Introduction

2. Tasks Part 1 Data Science for COVID-19 in South Korea

A. Create Tables

B. Load CSV Data

C. Query Tasks

Part 2 European Soccer Database

D. Load SQL File

E. Query Tasks

4. Discussion

Reviews

Related products

[SOLVED] COMP9315 24T1 Assignment 1

[SOLVED] COMP9315 Assignment 2

[SOLVED] Cs 327 homework 1 basic concepts/ dbms architecture

[Solved] DBMS-Assignment-1

[SOLVED] Dbms lab: assignment 3: database connectivity

[Solved] DBMS Lab Project-Database Development