[Solved] DBMS-Introduction to Database Systems Individual Homework 1- SQL tasks in MySQL

30 $

File Name: DBMS-Introduction_to_Database_Systems_Individual_Homework_1-_SQL_tasks_in_MySQL.zip
File Size: 744.18 KB

SKU: [Solved] DBMS-Introduction to Database Systems Individual Homework 1- SQL tasks in MySQL Category: Tag:

Or Upload Your Assignment Here:


1. Introduction

In this homework you need to practice some basic usages of MySQL, including creating databases, creating tables, loading csv files, loading SQL files, and using MySQL command to find the answer of tasks. After this homework, you will be capable of querying and analyzing your data by MySQL from zero to one.

There will be two datasets for this homework. The first one is COVID-19 data in South Korea, and the second one is European Soccer Dataset, both of these are downloaded from Kaggle Dataset (feel free to google them).

For the first dataset, you need to create a database based on our setting, and load the csv files into your created database. There are 6 easier questions you need to solve by SQL. For the second dataset, you need to load our sql file directly. There are 6 advanced problems you will meet. Read the following content for more details.

2. Tasks ◈ Part 1 – Data Science for COVID-19 in South Korea

In this part, you need to create tables based on the provided DB schema, and load csv files into the database.

A. Create Tables

First, download the COVID-19 data from here.

You can refer to this page to see the meaning of the columns.

Then you should create tables based on the following setting. Notice that you must make the detail of your tables the same as our description, including

‘table name’, ‘attribute name’, ‘attribute type’, ‘primary key’, ‘foreign key’, ‘null’.

Please paste the screenshot of your tables by using the `describe` command to your report, it will take 5% of your grades in this homework.

Table Name Attribute Name Type Primary Key Foreign Key NULL
patient_info patient_id varchar(10) YES NO
sex varchar(10)
age int
province varchar(20)
city varchar(20)
infection_case varchar(100)
search_trend date date YES NO
cold float
flu float
pneumonia float
coronavirus float
time date date YES NO
test int
negative int
confirmed int
released int
deceased int
time_age date date YES NO
age int YES NO
confirmed int
deceased int
time_gender date date YES NO
sex varchar(10) YES NO
confirmed int
deceased int
time_province date date YES NO
province varchar(20) YES NO
confirmed int
released int
deceased int
region code int YES NO
province varchar(20)
city varchar(20)
elementary_school_co unt int
kindergarten_count int
university_count int
elderly_population_rat io float
elderly_alone_ratio float
nursing_home_count int
weather code int YES region (code) NO
date date YES NO
avg_temp float
most_wind_direction int
avg_relative_humidity float

Please answer the following question in your report, it will take 10% of this homework.

  1. (3%) What is the difference between type “char” and type “varchar”? ?
  2. (3%) How many bytes it should take for “tinyint”, “smallint”, “mediumint”,

“int”? (e.g. 8 bytes for “bigint”)

And what’s the range they can express? (e.g. from -1000 to 1000) :“tinyint”, “smallint”, “mediumint”, “int”?

(e.g. 8 bytes for “bigint”)

?(e.g. from -1000 to 1000)

  1. (4%) What do you think about this DB schema? If you can change this table architecture, how would you modify it and why?

B. Load CSV Data

After creating the database, you need to load the downloaded csv files into your database.

Here we don’t restrict the method you use, but you have to check the data is loaded successfully by yourself. The following number is the data records for each table.

Table Name # of Data Records
patient_info 5164
search_trend 1642
time 163
time_age 1089
time_gender 242
time_province 2771
region 243
weather 26271

C. Query Tasks

In this part, here are 6 query tasks you need to write. Please read the following rules carefully.

You are only allowed to use one query (one delimiter) to find the answer, and you don’t have to explain your SQL. Noted that the column names of your query answers should be the same as our examples.

For homework submission, please write every query task into a single `sql` file, named as “1.sql”, “2.sql”, etc.

  1. (5%) How many days have the word “cold” been searched over 2000 times in one day? “cold”。
cnt
5566
  1. (5%) How many ways are there for those men under 30, living in Seoul Gangnam-gu being infected? List out in the alphabetical order as the following example.
infection_case
eating
talking
sleeping
  1. (5%) Find out the province, city, and the elementary school count in which the elementary school count is the top three most and the name of the province is different from the city. List out in decreasing order of the count.
province city cnt
Apple Taipei 5566
Banana Hsinchu 3344
Cherry Taichung 1122
  1. (5%) Find out the provinces whose days count of the average relative humidity larger than 70 are the top three most in May of 2016, and list their days count in decreasing order.
  1. (5%) Find out the province where the elderly population ratio is larger than the average and the date with maximum confirmed in one day. List out in the order of date increasingly. Notice that the average of the elderly population ratio is the average of those provinces which have the same name as the city.
province date
Apple 2020-01-01
Banana 2020-02-02
  1. (5%) How many “accumulated-confirmed”, “added-confirmed”,

“accumulated-dead”, “added-dead” are there while the search number of the word “coronavirus” is larger than two standard deviations? List your answer in ascending order by date and round coronavirus to second decimal place. The standard deviation should be calculated by the period from 2019-12-15 to 2020-06-29.

The added-confirmed and the added-dead should be calculated by “the accumulated count of that day minus the accumulated count of the previous day“. :coronavirus

date coronavirus confirmed_accumulate confirmed_add dead_accumulate dead_add
2020-01-01 66.00 3 0 12 0
2020-01-02 55.12 23 20 34 22

◈ Part 2 – European Soccer Database

In this Part, instead of creating tables and loading csv files, you need to load the provided DB file directly. Please download the file from here.

D. Load SQL File

Here we provide simple steps for the Linux environment. You can also load the SQL file by other methods, like execute the sql file using the “source” command.

  1. Firstly, create a database
  2. Then, back to your shell and enter command mysql -u {user_name} -p {DB Name} < hw1_part2.sql

(You can google “IO Redirection” for more detail of the above mechanism)

E. Query Tasks

In this part, you are also only allowed to use one query (one delimiter) to find the answer, and you don’t have to explain your SQL, except task 11 and task 12. Noted that the column names of your query answers should be the same as our examples.

For task 11 and 12, take screenshots of your queries, and write your analysis into the report. Try to explain what’re your queries doing, why you write these queries, what’s the meaning of the result, what’s your conclusion, etc.

For submission, also write every query task into a single `.sql` file, named as “7.sql”,

“8.sql” …, “11.sql”, “12.sql”

For the meaning of each table and column in part2, please refer to this page.

  1. (10%) List the average long_shots score(round to the second decimal place) of the players who had participated in the Italy Serie A league during 2015/2016 season with respect to the preferred foot. You should calculate the long_shots score by the newest data of each player.
preferred_foot avg_long_shots
left 30.87
right 20.87
  1. (10%) During the 2015/2016 season, for each of the leagues, if we have known that the average height of members in one team is over 180, what is the probability that the team can win?

The numerator is the winning count of those over-180-teams, and the denominator is the count of those over-180-teams. Round the probability to fourth decimal place. List out in the alphabetical order.

(e.g. In two matches, A and B, one team of A is over-180-teams and wins the game. Both team of B are over-180-teams and one of the team win the game, then the win probability is ⅔)

(e.g. In two matches, A and B, one team of A is over-180-teams and ties the game. Both team of B are not over-180-teams, then the win probability is 0) (You need to calculate the record of the same team in different matches. For example, team a1 and team a2 take match A, and team a1 and team b1 take match

B, then you need to consider a1 multiple times)

name prob
AppleLeague 0.5566
BananaLeague 0.3344
  1. (10%) The “win point” can be calculated by the following rule: “For each match, the winning team will get two points. The loser will get zero point. If the match is a draw, both of the teams will get one point. The win points of each team is the point divided by the match count the team participating in during the whole season.” The top five teams with the highest win points are called “the greats of the season”. Find out the average winning score (round to the second decimal place) and the team’s long name of the greats of the season during the 2015/2016 season. List out in the decreasing order of their win points.
team_long_name avg_win_score
AppleTeam 5.55
BananaTeam 4.44
CherryTeam 3.33
JellyTeam 2.22
OreoTeam 1.11
  1. (15%) We call it a landslide victory if there is a “larger than or equal to five” score gap between two teams in a match. And we call it an upset if any one of the sports betting companies has a higher betting odds on a team with landslide victory. Find out that for an upset, what is the average age of the player at that time and the average rating of the players of each team from the previous six months? Round the score to second decimal place and list out in the order of

match id increasingly.

(If a player has scores multiple times, please average all of them)

(Here we want to see the average data of players on the home side and away side in the upset matches. You don’t need to consider which team the team belongs to)

id home_player_avg_age away_player_avg_age home_player_avg_rating away_player_avg_rating
1 22.22 23.23 60.60 58.58
2 23.23 24.24 62.62 75.75
3 24.24 21.21 55.55 99.99
  1. (10%) Do the home team with home advantage has much more opportunity to win the game, or the team with a higher average score(which can be one of the overall_rating, dribbling, strength, interceptions, or the average of the four scores, calculated by the latest attribute before the player participating in the match) of the whole team players? Answer by your own view with one SQL query.
  1. (10%) You are a gambler of sport lottery. Analyzing this dataset with SQL and finding out the better way to place a bet. You can answer by your own view with multiple SQL queries. Focus on observation to the dataset and explain your analysis.

4. Discussion

TAs had opened a channel HW1 on New E3 forum of the course, you can post questions about the homework on the forum. TAs will answer questions as soon as possible.

Discussion rules:

  1. Do not ask for the answer to the homework.
  2. Check if someone has asked the same question before asking.
  3. We encourage you to answer other students’ questions, but again, do not give the answer of the homework. Reply the messages to answer questions.
  4. Since we have this discussion forum, do not send email to ask questions about the homework unless the questions are personal and you do not want to ask publicly.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] DBMS-Introduction to Database Systems Individual Homework 1- SQL tasks in MySQL
30 $