COM00142M
Department of Computer Science
Advanced Programming
I. Module Learning Outcomes
The module learning outcomes (MLO’s) for this module are as follows:
MLO 1. Demonstrate a critical understanding of the theory and application of advanced programming techniques.
MLO 2. Design and implement programs for real-world problems.
MLO 3. Communicate design decisions for the selection, storage and manipulation of data.
MLO 4. Critically evaluate the legal and ethical impact of software developments in real-world contexts.
This assessment addresses all the module learning outcomes listed above.
II. Assessment Background/Scenario
Your task is to design and develop a prototype application that demonstrates how data from the given data set can be formatted, reshaped and used to generate specific outputs. Your application can be a single programme or a collection of programmes that provide the equivalent functionality as described below.
Data Set (CSV)
The dataset contains online activity logs for 152 university students enrolled in a blended Computer Science course. The dataset is further split into three CSV files: “USER_LOG”, “ACTIVITY_LOG”, and “COMPONENT_CODES”. The “USER_LOG” CSV consists of:
● Date
● Time
● User Full Name *Anonymized
The “ACTIVITY_LOG” CSV file contains:
● User Full Name *Anonymized
● Component
● Action
● Target
The “COMPONENT_CODES” CSV file contains:
● Component
● Code
Functional requirements
The application should provide the following basic functionality:
● A means to load the initial data set (CSV file(s) provided) and translate it into a suitable format – either XML, JSON or an entity relationship structure (not CSV).
● A means to back up the suitable format using either files or a database. This should preserve the current state of the data when the program is closed and make it available when the program is reopened.
● A process for cleaning and preparing the data set, managing inconsistencies, errors and missing values. Cleaning can be done at either the CSV stage or after you have translated the data set into a new format and is required to be done before you apply any of the data manipulations and outputs detailed below.
● A graphical user interface(s) for interacting with the data set(s) that enables the user to:
o Load the initial data set (the CSV file(s)).
o Apply the cleaning, transformation, REMOVE and RESHAPE to produce a prepared data set.
o Load the prepared data set (from its translated format).
o Manipulate the range of values used to generate OUTPUT STATISTICS, GRAPHS and perform. CORRELATION analysis.
o Use the prepared data set to generate OUTPUT STATISTICS, GRAPHS and CORRELATION results.
It should be assumed that this prototype application will be able to handle other sets of data generated from the same source, i.e. data with the same column and row structure in CSV format, but containing different values and anomalies. However, the application is not required to be generic (work with multiple unknown data sets). Given this best practice regarding code reuse, encapsulation and a well-defined programming interface should be applied where applicable.
Data manipulation and outputs
Your prototype application needs to be able to perform. the following actions on the data set, once it has been translated into your selected format. First, determine whether NumPy or Pandas is more appropriate for this dataset. Next, decide if it’s more appropriate to split the data into manageable chunks before performing the following actions. Further, you should apply each of these actions in order, the later ones being the more challenging to achieve.
REMOVE: No outputs should include any data from Component: System, and Folder.
RENAME: The column “User Full Name *Anonymized” should be renamed as User_ID both in ACTIVITY_LOG and USER_LOG CSVs.
MERGE: Merge the suitable CSVs for analysing user interactions with each component.
RESHAPE: Reshape the data using pivot operation.
COUNT: The interactions for each user with the Component for each month. Add this new field to the new structure.
OUTPUT STATISTICS: Produce the mean, mode and median for the components: Quiz, Lecture, Assignment, Attendance, and Survey.
a. For each month
b. For the entire 13-week academic semester
OUTPUT CORRELATION: Produce a suitable graph that displays the following information from user interactions with the following components: Assignment, Quiz, Lecture, Book, Project, and Course. Determine if there is any significant correlation between the ‘User_ID’ and ‘Component’. You will need to select an appropriate visualisation to demonstrate this.
Non-functional requirements
● The GUI interface must be able to provide appropriate feedback to confirm or deny a user’s actions.
● The application must be able to handle internal and user-generated errors.
Technical requirements
A. The application is built using Core Python from version 3.7 – 3.12.
B. The application uses one or more of the advanced Application Programming Interfaces (API’s) introduced on this module such as: NumPy, Pandas, Seaborn, Matplotlib. It should NOT use alternative API’s for this functionality; however, appropriate Python core libraries can be used to access/query a database.
C. The application MUST run within the Anaconda environment using a Jupyter notebook.
D. The application and its parts must not run concurrently, and must NOT use Python threads.
The requirements specified here are the constraints within which you need to produce your prototype application. They are not negotiable.
III. Assessment Task(s)
This assessment has two tasks:
A. Design and implement a suitable prototype application that meets the specified requirements as either a single program, or a series of clearly identifiable programs. The program(s) submitted MUST be able to run under the constraints of the technical requirements section.
B. Produce a report that addresses the questions below and demonstrates your approach to the design and development of your prototype application, clearly justifying the decisions you have made. You should support your discussion with appropriate reference to relevant sources using the correct citation and reference structure as indicated in the guide to IEEE referencing system.
Where requested, you should select code samples from your software development that demonstrate specific algorithms and interactions. All code samples should be captured as images (screen shots), appropriately labelled, and presented in the appendix. You should refer to and discuss these within the context of each question. Do NOT include screenshots in the body of your report. For further guidance on using appendices, please see the ‘Submission Formatting’ page in Canvas.
Report contains 3 sections, as follows:
The report consists of three main sections, containing a series of questions to satisfy the learning outcomes. Each question has an indicative word count indicating what would be considered a reasonable response given the whole report. You may choose to redistribute this across questions; however, you must not exceed a total of 3,000 words and a maximum of 12 pages in the appendices. There is no limit on the number of references you provide. For further guidance on word counts and the required formatting of your report, please see the ‘Submission Formatting’ page in Canvas.
Section 1:
Theory supported by code samples (40%, 1,200 words plus up to 6 pages in the appendices)
Evidence for learning outcome: Demonstrate critical understanding of the theory and application of advanced programming techniques; design and implement programs for real-world problems. [MLO1, MLO2]
1a) [20 marks] Identify ONE part of your program design (such as processing the initial data set) that has the potential to be redesigned concurrently, using Python Threads. Clearly identify the program part and justify its selection and potential. Then discuss any specific issues that would need to be considered to refactor this part, and the wider impact of this refactoring on your whole program design. You should consider how data and/or communications will be passed between concurrent aspects, such as threads, and justify which Python constructs would support this redevelopment effectively.
It is expected that this question can be reasonably addressed within 600 words, with no more than 2 pages in the appendix for either pseudo code, diagrams or code samples that support your discussion. This section will require appropriate citations to achieve a pass.
1b) [20 marks] With specific reference to GUI interface constructs (such as text labels and buttons), and best practice regarding interface layouts, discuss how your GUI design and implementation supports THREE of the user interactions required by your prototype application. You should then justify your design decision for each, providing comparative examples to support your approach. You should aim to demonstrate as wide a range of interface constructs/layouts as your prototype application supports.
It is expected that this question can be reasonably addressed within 600 words, with no more than 4 pages in the appendix for GUI layout diagrams (wireframes OR screenshots) AND code samples that support your discussion. This section will require appropriate citations to achieve a pass.
Section 2:
Design decisions supported by code samples (40%, 1,200 words, up to 6 pages in the appendices)
Evidence for learning outcome: Communicate design decisions for the selection, storage and manipulation of data; design and implement programs for real-world problems.[MLO3, MLO2]
2a) [10 marks] With specific reference to the data manipulation requirements, REMOVE and RESHAPE, discuss your reasoning for your selected data format (JSON, XML, or entity relationship structure), and what advantages/disadvantages it has demonstrated in this context.
It is expected that this question can be reasonably addressed within 400 words with no more than 1 page of appendices for code samples, or data format samples. This section will require appropriate citations to achieve a pass.
NOTE: Failure to submit a functional program (or programs) in the Jupyter notebook format may result in a grade of zero for 2a only.
2b) [30 marks] For each of OUTPUT STATISTICS, GRAPHS, and CORRELATION discuss and demonstrate, via appropriate code samples and program output, the following:
● Any additional cleaning you have undertaken and justify it in the context of the relevant output(s). State clearly if you have carried out no additional cleaning, and justify why you chose not to do so.
● Explain why the APIs you selected for data analysis were chosen over other available options, focusing on how they are suited to producing the desired outputs.
● Provide a clear code example of how you have applied the selected API’s to achieve each output.
● What you observe from each output and what conclusion/s you can draw from it, if any.
It is expected that this question can be reasonably addressed within 800 words, with no more than 5 pages in the appendix for code samples, and screenshots of output and visualisations that support your discussion.
Section 3:
Reflection on the ethical, moral and legal aspects (20%, 600 words)
Evidence for learning outcome: Critically evaluate the legal and ethical impact of software developments within real-world contexts. [ MLO4]
3) [20 marks] Reflect on the ethical, moral and legal aspects of computing, as discussed in the module, and demonstrate an awareness of how these need to be considered in the role of a software engineer. Critically evaluate the following statement by building an effective ‘for’ or ‘against’ argument. This should be supported by the literature, using comparative examples, and recognition of the opposition’s position where appropriate.
“The moderation of social media platforms by their owners/operators is robust, fair, and effective at removing problematic content. Consequently, software engineers should not be required to consider the ethical, moral or legal consequences of employing user-submitted social media content as training data for machine learning.”
It is expected that this question can be reasonably addressed within 600 words. This section will require appropriate citations to achieve a pass.
IV. Deliverables
The appendices limit (12 pages) for this assessment supersedes that stated on the ‘Submission Formatting’ page in Canvas. Other than this, your assignment should be laid out following all other formatting guidelines that are specified in the ‘Submission Formatting’ page in Canvas.
You should submit two files as follows:
● A completed report answering the given questions as a single file in either .docx or .pdf format. This should NOT be included in the zipped file and should not exceed given word counts, or page limits.
● A single zipped file containing your program or programs. If a database has been used, you should produce a file dump of the data/table structure to include here. This should NOT contain the original data set.
Using a database:
Where you have opted to use an SQL or relational database (other than Mongo), include the following after your list of references:
A. Name of database and link to download (install package)
B. Version number of the database used
C. The name of the Jupyter notebook that creates and populates the database
D. The point in your code where local host and the port are set (make this clear)
You should make sure that your submitted code contains all the code required to set up and populate your database via a local host connection.
Referencing
You are required to use the IEEE referencing style for citing books, articles, and all other sources (such as websites) used in your assignment.
Good referencing is essential in order to meet the standards of academic integrity set by the University. All your sources must be acknowledged, regardless of whether you’ve included direct quotes or not. Visit your Academic Integrity Tutorial module in Canvas for additional guidance on effective referencing.
V. Marking Criteria
Section/Task |
Criteria |
Available marks |
Section 1. Theory supported by code samples |
||
Functional program(s) |
An implementation of your software design, using the specified platform(s), to demonstrate MLO2 as well as allow verification of your report discussion |
|
1a. Adaptation to a concurrent model |
Appropriate concurrent mechanisms/constructs have been selected for the refactoring. These and potential issues/impacts have been discussed in the context of the given scenario and requirements. |
20 |
1b. Implementing user interaction |
Appropriate GUI constructs and layouts have been selected to support the required interactions. There is a clear rationale for their selection given best practice in GUI constructs and layout. |
20 |
Section 2. Design decisions supported by code |
||
Functional program(s) |
An implementation of your software design, using the specified platform(s), to demonstrate MLO2 as well as allow verification of your report discussion.
|
|
a. Selected data format |
An effective format has been selected and a rational argument is presented for how it supports the nature of the data and the type of analysis required to produce the prototype applications requirements. Failure to submit a functional implementation may result in a grade of zero for this question only (2a) |
10 |
b. Generating outputs |
Appropriate code constructs, internal data structures, visual representations have been selected and applied to achieve the given requirements. Considerations have been made for any anomalies within the data set. There is a clear justification for design decisions, and accurate observations made given the applications output. |
30 |
Section 3. Reflection on ethics, morals and legal aspects |
||
Ethics, moral and legal |
Clear and appropriate examples from the literature are used to build an effective argument to support a ‘for’ or ‘against’ position on the statement. |
20 |
TOTAL |
100 |
Reviews
There are no reviews yet.