In this assignment, we will work with a database that could support the online tool MarkUs. MarkUs is “an open-source tool which recreates the ease and flexibility of grading assignments with pen on paper, within a web application. It also allows students and instructors to form groups, and collaborate on assignments.” (reference: http://markusproject.org/). By now, you have used MarkUs to hand in a few pieces of course work, so you have experienced some of its features.
Download these starter files from the course webpage:
- The database schema, ddl
- A very small sample data set (with just enough data to show you how to format your own), sql
You can simply download the files from a browser and save them into a directory of your choice. You can also use the command line to download the files. For example:
> wget http://www.teach.cs.toronto.edu/~csc343h/winter/assignments/a2/schema.ddl
Your code for this assignment must work on any database instance (including ones with empty tables) that satisfies the schema, so make sure you read and understand it.
The schema definition uses several types of integrity constraints:
- Some attributes must be present (“NOT NULL”).
- Every table has a primary key (“PRIMARY KEY”). The attributes that are part of a primary key form a key in the sense we are used to (no repeats), and none of them may be NULL.
- Some tables define a set of attributes to be (“UNIQUE”). The attributes that are part of a set declared to be UNIQUE form a key in the sense we are used to (no repeats), but one or more of them can be NULL.
- Some tables use foreign key constraints (“REFERENCES”). By default, they will refer to the primary key of the other table.
Warmup: Getting to know the schema
To get familiar with the schema, ask yourself questions like these (but don’t hand in your answers):
- How would the database record that a student is working solo on an assignment?
- How would the database record that an assignment does not permit groups, that is, all students must work solo?
- Why doesn’t the Grader table have to record which assignment the grader is grading the group on?
- Can different graders mark the various members of a group in an assignment?
- Can different graders mark the various elements of the rubric for an assignment?
- In a rubric, what is the difference between “out of” and “weight”?
- How would one compute a group’s total grade for an assignment?
- How is the total grade for a group on an assignment recorded? Can it be released before the a grade was assigned?
Part 1: SQL Statements
In this section, you will write SQL statements to perform queries. For each query, you will insert the final results into a table; we will check the contents of that table to see if your query was correct.
We will provide files named q1.sql, q2.sql, …, q10.sql Each of these files defines the schema for the result table for that query, so that you will know exactly what attributes we are expecting, of what type, and in what order. Put each of your queries into the appropriate file, and make your final statement in each file be:
INSERT INTO qX (SELECT … complete your SQL query here … )
where X is the number of the query. This will populate the answer table with the correct tuples.
You are encouraged to use views to make your queries more readable. However, each file should be entirely self-contained, and not depend on any other files; each will be run separately on a fresh database instance, and so (for example) any views you create in q1.sql will not be accessible in q5.sql.
At the beginning of the DDL file, we set the search path to markus, so that the full name of every table etc. that we define is actually markus.whatever. You may set the search path to markus at the top of your query files too, so that you do not have to use the markus prefix throughout. But do not use the schema called public.
The output from your queries must exactly match the specifications in the question, including attribute names, attribute types, and attribute order. Row order does not matter.
Write SQL queries for each of the following. Note that every query can be written in SQL, and that the questions are not in order of difficulty.
Throughout, when we ask for an assignment grade, report it as a percentage, such as 82.5.
- We’d like to compare the grade distributions across assignments.
For each assignment, report the average grade, the number of grades between 80 and 100 percent inclusive, the number of grades between 60 and 79 percent inclusive, the number of grades between 50 and 59 percent inclusive, and the number of grades below 50 percent. Where there were no grades in a given range, report 0.
|assignment id||The assignment ID|
|average mark percent||The average grade for this assignment, as a percent|
|num 80 100||The number of grades between 80 and 100 percent inclusive|
|num 60 79||The number of grades between 60 and 79 percent inclusive|
|num 50 59||The number of grades between 50 and 59 percent inclusive|
|num 0 49||The number of grades below 50 percent|
|Everyone?||Every assignment should appear, even if there are no grades associated with it yet.|
|Duplicates?||No assignment should appear twice.|
- Getting soft? We’d like to know if graders give higher and higher grades over time, perhaps due to fatigue. To investigate this, we only want to examine graders who’ve had lots of grading experience throughout the course.
For this question when we consider a grader’s average on an assignment, we will mean their average across individual students, not groups. For instance, the grade earned by a group of three students will contribute three times to the average.
Find graders who meet all of these criteria:
- They have graded (that is, they have been assigned to at least one group) on every assignment.
- They have completed grading (that is, there is a grade recorded in the Result table) for at least 10 groups on each assignment.
- The average grade they have given has gone up consistently from assignment to assignment over time (based on the assignment due date).
Report their name, the average across assignments of their average grade, and the increase between their average for the first assignment and their average for the last assignment. (For example, if the former is 72.5 percent and the latter is 82.9 percent, the increase is 10.4 percentage points.) You may assume that no two assignments have the same due date; this means that there is unambiguously one first and one last assignment.
|ta name||The name of a grader who meets the criteria, in this form:first name plus surname with a blank in between|
|average mark all assignments||The average across assignments of their average grade|
|mark change first last||The increase between their average for the first assignment and their average for the last assignment, as a number of percentage points.|
|Everyone?||Include only graders who meet the criteria.|
|Duplicates?||No grader can appear twice.|
- Solo superior. We are interested in the performance of students who work alone vs in groups.
Find assignments where the average grade of those who worked alone is greater than the average grade earned by groups. For each, report the assignment ID and description, the number of students declared to be working alone and their average grade, the number of students (not groups) declared to be working in groups and the average grade across those groups (not students), and finally, the average number of students involved in each group, (include in this calculation those who worked solo).
If groups have been declared for an assignment but no grades have been recorded, you will be able to report counts but not average grades — the averages will have to be null.
|assignment id||ID of the assignment|
|description||Description of the assignment|
|num solo||The number of students declared to be working alone|
|average solo||The average grade among students who worked alone, or null if there were none who worked alone and have a grade recorded in Result.|
|num collaborators||The number of students (not groups) declared to be working in groups|
|average collaborators||The average grade across those groups (not students), or null if there were no groups that have a grade recorded in Result.|
|average students per group||The average number of students involved in each group,(include those who worked solo in this calculation)|
|Everyone?||Every assignment should appear, even if no grades are available for it yet.|
|Duplicates?||No assignment should appear twice.|
- Grader report We want to make sure that graders are giving consistent grades.
For each assignment that has any graders declared, and each grader of that assignment, report the number of groups they have already completed grading (that is, there is a grade recorded in the Result table), the number they have been assigned but have not yet graded, and the minimum and maximum grade they have given.
|assignment id||The ID of an assignment that has one or more graders declared|
|username||A grader who is declared for that assignment|
|num marked||The number of groups they have already graded|
|num not marked||The the number they have been assigned but have not yet graded|
|min mark||The minimum grade they have given for that assignment, or null if they have given no grades|
|max mark||The maximum grade they have given for that assignment, or null if they have given no grades|
|Everyone?||Every assignment that has graders declared should appear.|
|Duplicates?||No assignment should appear twice.|
- Uneven workloads We want to make sure that graders have had fairly even workloads.
Find assignments where the number of groups assigned to each grader has a range greater than 10. For instance, if grader 1 was assigned 45 groups, grader 2 was assigned 58, and grader 3 was assigned 47, the range was 13 and this assignment should be reported. For each grader of these assignments, report the assignment, the grader, and the number of groups they are assigned to grade.
|assignment id||The ID of an assignment with a range (as described above) greater than 10|
|username||A grader for that assignment|
|num assigned||The number of groups this grader has been assigned|
|Everyone?||Every assignment with a range over 10 should appear.|
|Duplicates?||No assignment should appear more than once.No assignment-grader pair should appear more than once.|
- Steady work. We’d like groups to submit work early and often, replacing early submissions that are flawed or incomplete with improved ones as they work steadily on the assignment.
For each group on assignment A1 (the assignment whose description is ‘A1’), report the group ID, the name of the first file submitted by anyone in the group, when it was submitted, and the username of the group member who submitted it, the name of the last file submitted, when it was submitted, and the username of the group member who submitted it, and the time between submission of the first and last file.
It is possible that a group submitted only 1 file. In that case the first file and the last file submitted are the same. It is also possible that two files could be submitted at the same time. In that case, report a row for every first-last combination for the group.
|group id||The ID of an A1 group|
|first file||The name of the first file submitted by anyone in the group||or null if no file was submitted|
|first time||The timestamp for its submission|
|first submitter||The username of the group member who submitted it|
|last file||The name of the last file submitted by anyone in the group|
|last time||The timestamp for its submission|
|last submitter||The username of the group member who submitted it|
|elapsed time||The time between the first and last submission for this group|
|Everyone?||Every group defined for A1 should appear.|
|Duplicates?||A group may occur more than once if there is a tie for its first and/or last submission.|
- High coverage We are interested in identifying graders who have broad experience in this course.
Report the username of all graders who have been assigned at least one group (the group could be solo or larger) for every assignment and have been assigned to grade every student (whether in a solo or larger group) on at least one assignment.
|ta||The username of a grader who meets the criteria.|
|Everyone?||Only graders who meet the criteria should appear.|
|Duplicates?||No grader should appear more than once.|
- Never solo by choice We are interested in the performance of students who chose to work in multi-student groups wherever possible.
For this question, assume that at least one assignment allows groups.
Find students who never worked solo on an assignment that allows groups, and who submitted at least one file for every assignment (indicating that they did contribute to the group). Report their username, their average grade on the assignments that allowed groups, and their average grade on the assignments that did not allow groups.
|username||The username of a student who meets the criteria|
|group average||Their average grade on the assignments that allowed groups.(Assume that at least one assignment allows groups.)|
|solo average||Their average grade on the assignments that did not allow groups, or null if there were none.|
|Everyone?||Every student who meets the criteria should appear.|
|Duplicates?||No student should appear twice.|
- Inseparable Report pairs of students who each did group work whenever the assignment permitted it, and always worked together (possibly with other students in a larger group).
|student1||The username of the student in the pair that comes first alphabetically|
|student2||The username of the student in the pair that comes second alphabetically|
|Everyone?||Only pairs that meet the criteria should appear.|
|Duplicates?||No pair should appear twice.|
- A1 report We’d like a full report on the A1 grades per group, including a categorization.
Compute the grade out of 100 for each group on assignment A1, the difference between their grade and the average A1 grade across groups (negative if they are below average; positive if they are above average), and either “above”, “at”, or “below” to indicate whether they are above, at or below this average.
|group id||The ID of a group for the assignment whose description is ‘A1’|
|mark||The group’s grade on A1, as a percentage, or null if they have no grade|
|compared to average||The difference between the group’s grade and the average grade or null if they have no grade|
|status||Either “above”, “at”, or “below” to indicate whether they are above, at or below this average, or null if they have no grade|
|Everyone?||Every group declared for A1 should appear.|
|Duplicates?||No group should appear more than once.|
Part 2: Embedded SQL
Part 2 will be provided in the next few days.
You will need to look up details about built in types (such as timestamps) and how to work with them. You may find the coalesce feature helpful. Become familiar with the documentation for postgreSQL. (But don’t waste time searching without purpose for features that you hope will save you.)
In your JDBC code for Part 2, some of your SQL queries may be very long strings. You should write them on multiple lines for readability, and to keep your code within an 80-character line length. But you can’t split a Java string over multiple lines. You’ll need to break the string into pieces and use + to concatenate them together. Don’t forget to put a blank at the end of each piece so that when they are concatenated you will have valid SQL. Example:
String sqlText =
“select client_id ”
“from Request r join Billed b on r.request_id = b.request_id ” + “where amount > 50”;
This makes the string read like a query at the postgreSQL shell, except with some extra quotes and concatenation characters.
Here are some common mistakes and the error messages they generate (you may remember these from the exercise we did in class):
- You forget the colon:
cdf> java -cp /local/packages/jdbc-postgresql/postgresql-9.4.1212.jar Example Error: Could not find or load main class Example
- You ran it on a machine other than dbsrv1
cdf> java -cp /local/packages/jdbc-postgresql/postgresql-9.4.1212.jar: Example SQL Exception.<Message>: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
Aside: Your prompt on cdf is probably different from mine. I set mine to “cdf>”.
Important: How we will assess the correctness of your code
We will be testing your code in the CS Teaching Labs environment using PostgreSQL. More specifically, we will test your queries and your JDBC code using an autotester integrated into MarkUs. You will be able to run a self-test on MarkUs, using the same autotesting infrastructure. It is your responsibility to make sure your code runs in this environment before the deadline! Code which works on your machine but not on our autotester will earn a grade of zero for correctness, and will not be eligible for a remark request.
The self-test will confirm that your code connects to ours as expected, for instance, that your attribute types and attribute order are as specified (for the SQL query part) and that your method names and parameters are consistent with ours (for the JDBC part). It will also check that your code produces the correct results, but only for one very simple test case. The self-test will not run your code through a thorough test suite. It’s part of your job to plan and implement a thorough set of tests for your own code. If you do, it will sail through the thorough autotesting that we will ultimately use when grading.