MSIN0097
Predictive Analytics Individual Coursework
A P MOORE
INDIVIDUAL COURSEWORK
Friday 26th February 2021 60% of module mark
2000 words
BRIEF
The individual coursework task is to identify a dataset and explore building a predictive model using the methods and techniques presented in the first 5 weeks of the course.
There are six main steps:
1. Obtain a dataset and explain the problem you are trying to solve.
This will characterise the type of predictive model you can build
2. Explore the data to gain insights.
Visualize and explain the main trends in the data
3. Prepare the data to better expose the underlying data patterns to Machine Learning algorithms.
4. Explore different models and shortlist the best ones.
5. Fine-tune your models and combine them into a better solution.
6. Present your final solution with any summary conclusions.
GUIDANCE
END- TO-END
NOTEBOOK
DATASETS
Useful places for ML datasets:
Tabular & cleaned: https://github.com/EpistasisLab/pmlb/tree/master/datasets By domain: https://datasetlist.com
By application: https://github.com/awesomedata/awesome-public-datasets Search engine: https://datasetsearch.research.google.com
@rasbt
CRISP CYCLE
DATA DEVELOPMENT LIFECYCLES
1
2
3
4 5
GUIDANCE
Fast First Pass
Make a first-pass through the project steps as fast as possible.This will give you confidence that you have all the parts that you need and a baseline from which to improve.
Cycles The process in not linear but cyclic. You will loop between steps, and probably spend most of your time in tight loops between steps 3-4 or 3-4-5 until you achieve a level of accuracy that is sufficient or you run out of time.
The write up in the final submitted Notebook can be more linear you do not need to include all of your work, ie. including all dead-ends, and it should be concise and consistent.
GUIDANCE
Attempt Every Step
It is easy to skip steps, especially if you are not confident or familiar with the tasks of that step.Try and do something at each step in the process, even if it does not improve accuracy.You can always build upon it later. Dont skip steps, just reduce their contribution to your final submission as necessary.
Ratchet Accuracy
The goal of the project is to achieve good model performance (which ever metric you use to measure this). Every step contributes towards this goal.
Set some simple benchmarks early on.Treat changes that you make as experiments that potentially increase accuracy.
Performance is a ratchet that can only move in one direction (better, not worse).
GUIDANCE
Adapt As Needed
Modify the steps as you need on a project, especially as you become more experienced with using the Notebook.
The final submitted Notebook does not need to preserve the suggested structure if you think something else is more appropriate.
A NOTE ON GRADES
KEY DATES
Submission Friday 26th February 2021, 10 am
TEACHING SUPPORT
Kamil Tylinski Teaching Assistant
[email protected]
Jiangbo Shangguan Teaching Assistant
[email protected]
Bartos Kultys Teaching Assistant
[email protected]
Editha Nemsic
Teaching Assistant
[email protected]
Dr Viviana Culmone Teaching Assistant
[email protected]
Walter Hernandez
Teaching Assistant
[email protected]
Reviews
There are no reviews yet.