, , ,

[SOLVED] ‌comp0035 coursework 01 2024 coursework specification

$25

File Name: ‌comp0035_coursework_01_2024_coursework_specification.zip
File Size: 518.1 KB

5/5 - (1 vote)

COMP0035 Coursework 01 2024 Coursework specification

  1. Table of contents

    Introduction Coursework specification

    Getting started

    General requirements and constraints Section 1: Data exploration and preparation Section 2: Database design and creation Section 3: Tools

    Section 4: References Submission

    Marking

    Module learning outcomes Mark allocation

    Grading criteria Appendices

    Code quality

    Code that does not fully function Guidance on Moodle

    Version: 1. 28/09/24

  2. Introduction

    The aim of the combined coursework in this module is for you to select and apply some of the relevant software development and data science techniques that are used in a typical project lifecycle.

    Coursework 1 focuses on data preparation and database design.

    Coursework 2 continues from coursework 1, focusing on requirements, application design and testing.

    This document specifies coursework 1 which is worth 40% of the assessment marks available for the module. This is an individual coursework.

    You will submit a written report; and a repository of code files that combined meet the requirements detailed in this specification.

    Aim to make progress each week of first five weeks of the module, in line with module’s teaching activities.

  3. Coursework specification

    1. Getting started

      1. Select a dataset using the ‘group’ selection task in Moodle Week 1 (https://moodle.ucl.ac.uk/mod/ choicegroup/view.php?id=6089982). Each ‘group’ option is associated with a data set. ‘Group selection’ is a Moodle term for the type of task, the coursework is individual.

      2. Accept a GitHub classroom assignment. This creates the repository. Instructions are also given in Tutorial 1.

        1. Login to GitHub.com.

        2. Click on the GitHub classroom link (https://classroom.github.com/a/zqVIaThf)

        3. Accept the assignment.

        4. If prompted, accept to join the comp0035-ucl organisation.

      3. Download the dataset for your group choice and add it to your repository. Use the links in Moodle (Resources > Datasets). For files > 25MB use GitHub large file storage (https://docs.github.com/ en/repositories/working-with-files/managing-large-files/about-git-large-file-storage).

    2. General requirements and constraints

      • Compile all written work into a single report in either PDF or Markdown format. Name the document coursework1.

      • The report supports the code and techniques used in the coursework. It is not an essay, be succinct. There are no word limits.

      • Demonstrate regular use of source code control using GitHub. Create the repository using the GitHub classroom assignment. Keep the repository private. Keep the repository in the ucl- comp0035 organisation.

      • You must use the data set allocated to you on Moodle.

      • This is an individual coursework. Do not collude with other students using the same data set.

      • Use of code AI tools is permitted when writing code. UCL recommends using Microsoft Copilot (https://liveuclac.sharepoint.com/sites/Office365/SitePages/Bing-Enterprise-Chat.aspx) using your UCL credentials. This must be stated in the ‘References’ section.

      • Use relevant techniques from the course, or from data science and/or software engineering processes. Provide references for techniques not included in the course material.

      • Diagrams can be hand-drawn and scanned. Using software to draw them does not increase marks.

    3. Section 1: Data exploration and preparation

      The purpose of this section is:

      • to use python pandas to describe the data set structure and content; and as a result demonstrate that you understand the data set.

      • to use python pandas prepare the data for later use in developing applications. The data you prepare will be used in COMP0034 coursework to create charts in a dashboard app.

      • to demonstrate that you can write code that is reusable and understood by other developers.

      • to demonstrate that you can apply relevant software engineering and data science techniques. Code quality is also assessed.

      Use only Python and pandas. matplotlib may be used where pandas DataFrame.plot() (https:// pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html) is not sufficient.

      Create charts where they support your exploration and preparation; but do not focus on the visual aesthetic as this is not assessed.

      You may need to prepare the data in order to complete the exploration and hence your code may not neatly split between 1.1 and 1.2. This is OK, the code structure does not need to exactly match the report structure.

      1. Section 1.1 Data exploration

        1. Code: Write python code to explore and describe the data structure and content. Including, but not limited to, size, attributes and their data types, statistics, distribution of the data, etc. Consider potential data quality issues.

        2. Report: Describe the results of your exploration of the data. Do not include the code in the report.

      2. Section 1.2 Data preparation

        1. Report: Briefly describe a target audience and state at least 3 questions that they might be interested to explore using the data. This defines the purpose for which you will prepare the data.

        2. Code: Write python code to prepare the data such that it can be used to try to answer the questions for the audience described in step 1. Aim to have sufficient data, and avoid unnecessary data. The prepared data should be in a format that can be read into one or more pandas dataframes from a file (.csv or .xlsx). If relevant, address any data quality issues identified in section 1.1.

        3. Report: Explain how you ensured the data is relevant for the purpose.

        4. Include the original and prepared versions of your data set files in your repository.

    4. Section 2: Database design and creation

      The purpose of this section is:

      • to demonstrate that you understand the structure of a relational database and the principles of normalisation by designing an appropriate database and drawing this as an entity relationship diagram (ERD).

      • to demonstrate that you can write Python code to create an SQLite database based on the ERD. The database you create can be used in COMP0034 coursework in a data driven web application.

      1. Section 2.1: Database design

        Design a relational database that can store the data (based on either the prepared or the raw data set, your choice). Consider normalisation.

        Document the design as an Entity Relationship Diagram (ERD) that includes the following details as a minimum:

        • table(s)

        • attributes in each table

        • data type of each attribute

        • primary key attribute for each table

        • foreign key attribute(s) if relevant

        • relationship lines between tables

        Include the ERD in your report. An explanation is not required, though you may discuss your normalisation if relevant.

      2. Section 2.2: Database code

        Write python code that:

        • creates a database structure based on the ERD for an SQLite database file.

        • takes the data from the dataset file and saves it to the SQLite database file. Note: do not create a database that requires a server such as MySQL or PostgresSQL.

        The quality of the code is assessed.

        Use relevant Python packages, i.e. pandas and sqlite3.

    5. Section 3: Tools

      The purpose of this section is to demonstrate appropriate and effective use of relevant software engineering tools.

      1. Section 3.1 Environment management

        Provide relevant files and instructions that allow the marker to set up and run your code in a Python virtual environment. They will use pip and setuptools with the commands:

        pip install -r requirements.txt pip install -e .

        As a minimum, edit the files that were provided in the starter code of the repository:

        • requirements.txt: list the packages used in your code

        • pyproject.toml: provide basic project details and code package location

        • README.md: provide instructions to install and run your code for the data preparation and the database creation

      2. Section 3.2: Source code control

        Add the URL for your repository to the report. Make regular use of source code control.

      3. Section 3.3: Linting

        Use a Python linter to demonstrate how your code meets Python style standards such as PEP8, PEP257. For example:

        • state which Python linter you used.

        • provide evidence of the results of running the linter.

        • if issues are reported by the linter, address these and then run the linter again and show the results.

        • if any issue cannot be addressed, explain why not.

    6. Section 4: References

      Include code references in comments in the code files close to where it is used. Include all other references, if used, in the report.

      1. Section 4.1 Reference use of AI

        State either that you used AI, or state that you did not.

        If you used AI, include the details stated in the UCL guidance (https://library-guides.ucl.ac.uk/ referencing-plagiarism/acknowledging-AI#s-lg-box-wrapper-19164308).

      2. Section 4.1 Dataset attribution

        Comply with any license condition required for your data set (given in the data set link in Moodle > Resources > Data sets).

        Each license is different and tells you what has to be cited; e.g. see open government licence v3 (https:// www.nationalarchives.gov.uk/doc/open-government-licence/version/3/). Typically, but not always, ‘attribution’ is required: i.e. include a statement listing who owns the data and its location.

  4. Submission

    Refer to Moodle > Assessment for the deadline date and time.

    Submit your work on Moodle in the assignment submission. The submission states the upload format:

    .zip for the code (and report if in markdown) plus .pdf for the report (if not in markdown).

    GitHub is not an acceptable alternative for submission, though its facility to download the code files as zip may be useful to you.

    Make sure all files are in the submission. URLs linking to external files cannot be marked as they could be changed after the submission time. The only exception is where the original data files are too large to upload to Moodle – in this exceptional situation list url(s) to the data files in your report or the README.md instead.

    Do not include your .venv folder in the zip file, this creates unnecessarily large zip files.

    Table: Submission checklist

    Section

    Report

    Code files

    1. Data exploration and preparation

    Description and explanation.

    Python code to explore/describe the data. Python code to prepare the data.

    Original data set Prepared dataset

    2.

    Database design and creation

    Entity Relationship Diagram (ERD).

    Python code to create the database. SQLite database file.

    3. Tools

    Source code control: URL to GitHub repository Linting evidence.

    Environment management: requirements.txt, pyproject.toml, README.md

    4.

    References

    Statement of AI use.

    Data set attribution.

    Other references if used.

    Include code references within the code files.

  5. Marking

    1. Module learning outcomes

The module’s published learning outcomes that are assessed in this coursework are indicated in the table.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] ‌comp0035 coursework 01 2024 coursework specification
$25