FIT5196 – S2 – 2024 Assessment 2
-
Introduction
This is a group assessment worth 40% of the total mark for FIT5196. It consists of three tasks related to data analysis and manipulation.
-
Task 1: Data Cleansing (50%)
-
Input and Output Files
Input files: Group_dirty_data.csv , Group_outlier_data.csv , Group_missing_data.csv ,
warehouse.csv
Output files: Group_dirty_data_solution.csv , Group_outlier_data_solution.csv ,
Group_missing_data_solution.csv , Group_ass2_task1.ipynb , Group_ass2_task1.py
-
Dataset Description
The dataset contains transactional retail data from an online electronics store (DigiCO) in Melbourne, Australia. Each row represents a single order with columns such as order_id , customer_id , date , etc.
-
Tasks
-
Detect and fix errors in _dirty_data.csv
-
Impute the missing values in _missing_data.csv
-
Detect and remove outlier rows in _outlier_data.csv (w.r.t. the delivery_charges attribute only)
-
-
Methodology
The group_id_ass2_task1.ipynb should demonstrate the methodology to achieve correct results. This includes using appropriate Python functions for input, process, and output, and presenting the solution in an efficient and proper way.
-
-
Task 2: Data Reshaping (15%)
-
Input and Output Files
Input file: suburb_info.xlsx
Output file: Group_ass2_task2.ipynb
-
Task Description
Study the effect of different normalisation/transformation methods on columns number_of_houses , number_of_units , population , aus_born_perc , median_income , median_house_price to prepare data for a linear regression model to predict median_house_price .
-
-
Task 3: Project Reflective Report (15%)
-
Input and Output Files
Input file: None
Output file: Group_report.pdf
-
Tasks
-
Feedback Session During Week 10 Applied Session: Present progress, future planning, record TA’s suggestions, and continue work based on suggestions.
-
Group Reflection Presentation (Hurdle): Present methodology and answer questions during Week 12 applied sessions. Mandatory attendance.
-
Reflective Report: Provide a report based on feedback, tailored solutions, and any related findings.
-
-
-
Submission Requirements
Submit 6 files: Group_dirty_data_solution.csv , Group_missing_data_solution.csv , Group_outlier_data_solution.csv , Group_ass2_task1.ipynb , Group_ass2_task1.py , Group_ass2_task2.ipynb , Group_report.pdf
Zip all files into Group_ass2.zip
Follow file naming standards and ensure files are parsable and readable.
-
Appendix
Instructions for generating .py files from notebooks. Submission checklist details.
Reviews
There are no reviews yet.