Create a data science-related notebook, a standalone application, web application or any other kind of artefact with which you apply machine learning and data mining/analysis techniques to a chosen real-world problem domain.
Some ideas for possible projects:
- Current events: COVID19 data analysis; macroeconomic impacts of COVID19.
- Time-series analysis, time-series forecasting.
- Recommender engine: create application for making recommendations based on user preferences.
- Fitness data: analysis of your personal or some group’s FitBit data.
- Twitter: sentiment analysis, text classification, semantic analysis, network visualisation, geospatial visualisation, data storage etc.
- Facebook: network visualisation, geospatial visualisation, network analysis, natural language processing, data storage etc.
- Data journalism: data visualisation – implementation of interactive graphs (web enabled), infographics.
- A live Kaggle competition problem dataset https://www.kaggle.com/competitions (see notes below) 9. Web app that performs some data-related service.
- Process mining.
- …or something entirely different.
Topics NOT to cover:
- Currency markets, BitCoin, share market stock prices
- Closed Kaggle competition datasets
- Previously researched topics for which there are existing notebooks
- Definitely NO to the TITANIC dataset
OTHER NOTES DATA SOURCES
This is a recommendation, not a requirement: be as original as you can with your data sources. Some datasets are very popular and have come up repeatedly in assignments over the years. Unfortunately, because they are popular there are a lot of online sources that have scripts published for those datasets. In many cases, related assignment submissions involve some form of plagiarism. While the internet is a big place, we have seen a lot of these scripts before and it is easy to catch. Unless you are going to do something genuinely novel with a well-used data source (you will know it is well-used if you can easily find python kernels for it), avoid these data sources. The safest bet is a dataset that is integrated from multiple disparate sources.
WARNING ABOUT CHOOSING A KAGGLE DATASET
Discuss this with the lecturer first. A high standard is set when marking Kaggle-related submissions. If you use a Kaggle dataset, we recommend you do not look at related Kaggle kernels as there can be a temptation to copy what you see. Copying without attribution is plagiarism which could lead to zero marks for this assignment. Be aware that markers are familiar with Kaggle kernels, in part due to marking assignments for other papers and cohorts. We will also be looking through related kernels prior to marking.
You are encouraged to use Python; however, this is not an absolute pre-requisite for all parts of your project.
If you choose to build a GUI based application, Python does possess libraries that facilitate this; however, you can use Qt or technologies like .NET which allows you to call your Python methods that implement the logic in your application.
In previous years, some students have created web-based applications which have front-end and back-end components that both serve webpages and perform some data science related tasks. If you have web development skills, then you are encouraged to pursue this. It is sufficient that your application run on localhost.
We will go ahead and conduct presentations despite the current constraints. We will aim for live team presentations over Zoom. Each person in the group will need to present. The presentations will be short and to the point. We would like you to aim for a presentation using only a handful of power point slides, lasting up to 15 minutes, or an application demo lasting up to 20 minutes. Make your presentation interesting. Don’t focus on technical details. Consider your audience to be tech-savvy executives. Focus instead on the story that you are trying to tell and sell to the audience/decision makers. The presentations will be marked in part by your peers.
Make sure you do these four things:
- Submit all your code, experimental code in a mixture of .py and Notebook files as is appropriate for each project. Each project should submit at least one Notebook that contains all the key findings and summaries.
- Submit a separate document (or include this at the top of a notebook) that details what each team member contributed to the assignment. Not all contributors will be awarded the same mark. Each team member must submit their own version of how each team member contributed.
- Each member of the class will be marked individually
- Watch and mark others’ presentations