[Solved] CSE510 Project 1-Data Analysis

$25

File Name: CSE510_Project_1-Data_Analysis.zip
File Size: 282.6 KB

SKU: [Solved] CSE510 Project 1-Data Analysis Category: Tag:
5/5 - (1 vote)

Our first project is focused on preprocessing, analysing and visualizing real-world datasets. Applying the basic statistical methods and extracting valuable summary about it. During the third part of the project you are expected to employ multiple datasets and extract insights from them.

Dataset

Requirements to the dataset for Part I and II:

  • Represent the real-world data
  • Contain at least 50k entries
  • Should be different from the one used in class

Possible resources includes:

Part I: Perform data analysis of the dataset [20 points]

  1. How many entries and variables does the data set comprise?
  2. What types of data is included?
  3. Are there any data missing?
  4. Provide the main statistics about the entries of the dataset (mean, std, etc.)
  5. Visualize the data (min 3 graphs), e.g. correlation between different variables. Are there any interesting patterns?

Part II: Apply ML analysis [40 points]

  1. Choose the features and targets in the dataset.
  2. Preprocess the dataset for training (e.g. cleaning and filling the missing variables, split between training/testing/validation).
  3. Apply ML algorithms (min 3 algorithms) to model the target variable. This can be either classification or regression task. You can use any of the libraries with inbuilt ML functions.
  4. Provide the comparison of the results of different ML models you have used. This can be in the form of graph representation and your reasoning about the results.

Part III: Employ multiple datasets and extract insights [40 points]

  1. Choose any related dataset to your current one. Combine the two into one dataset. The combined dataset doesnt have size requirements.
  2. Choose the correlated variables.
  3. Perform statistical analysis on finding the correlation between selected features from both datasets. Examples:
    1. Find the correlation between the crime and the number of schools in the area.
    2. Find the correlation between the traffic and the population in the area
  4. Analyse the results and any interesting patterns.

Submit the Project

  • Submit at UBLearns > Assignments
  • The code of your implementations should be written in Python. You can submit multiple files, but they all need to have a clear name.
  • All project files should be packed in a ZIP file named zip (e.g. avereshc_project1.zip ).
  • Your Jupyter notebook should be saved with the results. If you are submitting python scripts, after extracting the ZIP file and executing command python main.py in the first level directory, all the generated results and plots you used in your report should appear printed out in a clear manner.
  • In your report include the answers to questions for each part. You can complete the report in a separate pdf file or in Jupyter notebook along with your code.
  • Include all the references that have been used to complete the project.

Late Days Policy

Up to 5 free late days can be used throughout the course. They can be applied towards any project. No need to inform the instructor, late submission will be tracked at UBlearns.

Important Information

This project should be done individually. The standing policy of the Department is that all students involved in an academic integrity violation (e.g. plagiarism in any way, shape, or form) will receive an F grade for the course. Refer to the Academi c Integrity website for more information.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSE510 Project 1-Data Analysis
$25