Cardiovascular Health Analysis
Notebooks as Business Reports
Due Friday 20th October 2023 18:00
Cardiovascular health is a key determinant of overall health and well-being. Cardiovascular diseases, including heart disease and stroke, are among the leading causes of death worldwide. By understanding and optimizing cardiovascular health through lifestyle choices and medical intervention when necessary, individuals can significantly reduce their risk of cardiovascular diseases and improve their overall quality of life.
This assignment seeks to uncover valuable insights regarding the relationship between demographics, health, and lifestyle factors in relation to cardiovascular disease by analysing the cardiac health dataset. The analysis will be conducted by leveraging Python programming in Google Colab notebooks, exploring SQLite databases, data visualization, and applying industry best practices in programming.
Assignment Objectives
Analyse the provided dataset using Python in Google Colab notebooks to understand SQLite databases, data exploration, data visualization, and applying industry best practices in programming.
Deliver two notebooks: a business report (which includes code, analysis, and discussion) and a development notebook which includes pseudocode, testing, and any other industry best practice not observable from the business report.
Learning Objectives
-
Understand and work with SQLite databases.
-
Perform data exploration and analysis using Python and relevant libraries.
-
Create effective data visualizations.
-
Apply industry best practices in programming, such as adding comments, creating a modular design, reusing code, and using version control with GitHub.
-
Select and apply appropriate data analysis techniques.
-
Interpret and communicate findings effectively.
-
Demonstrate critical thinking and problem-solving skills.
The Business Report Notebook must run on a Google Colab instance and require no additional steps other than running code cells within the notebook.
Note: You can have code-cells in the notebook set up the Colab instance, for example, copy data, python scripts, or other notebooks. But other than running a code cell your notebook should require no further interaction from the user/reader of the notebook.
Tasks:
-
Set up the environment:
-
Create a new Google Colab notebook.
-
Connect the notebook to your GitHub account.
-
Import the necessary libraries (SQLite3, Pandas, Matplotlib, and ipywidgets).
-
-
Access the database:
-
Connect to the cardiohealth SQLite database using the SQLite3 library.
-
Examine the schema of the database and understand the structure of the tables.
-
-
Data extraction and manipulation:
-
Write SQL queries to extract relevant information from the tables
-
Use pandas to load the query results into data frames and perform data manipulation tasks such as filtering, grouping, and aggregation.
-
Clean and pre-process the data, addressing any missing or inconsistent values.
-
-
Interpretation and conclusion:
-
Summarise the main insights you have gained from the data analysis.
-
Discuss any limitations of your analysis and suggest possible improvements.
-
Reflect on the usability and effectiveness of Python notebooks
-
Analysing and Visualising Cardio Health Dataset to Gain Valuable Insights
Note that you may need to pre-process and clean the data to ensure accurate results.
Dataset
-
age: Age of the individual (in days).
-
gender: Gender of the individual (1 for female, 2 for male).
-
height: Height of the individual (in cm).
-
weight: Weight of the individual (in kg).
-
ap_hi: Systolic blood pressure.
-
ap_lo: Diastolic blood pressure.
-
cholesterol: Cholesterol level (1: normal, 2: above normal, 3: well above normal).
-
gluc: Glucose level (1: normal, 2: above normal, 3: well above normal).
-
smoke: Smoking status (0 for non-smoker, 1 for smoker).
-
alco: Alcohol consumption status (0 for non-drinker, 1 for drinker).
-
active: Physical activity status (0 for inactive, 1 for active).
-
cardio: Cardiovascular disease presence (0 for no disease, 1 for disease).
Examine the relationship between the occurrence of cardiovascular disease and the following factors within the provided cardio health dataset:
-
Choose one: Investigate either Age groups or Gender.
-
Choose one: Explore either BMI (Body Mass Index) or Blood pressure (Systolic and Diastolic).
-
Choose one: Analyze either BMI and Cholesterol, Glucose and Blood pressure, or
Cholesterol and Blood pressure.
-
Choose one: Study either the connection between Smoking and physical activity or
Alcohol and physical activity.
Please select one option from each group and assess how it impacts the presence or absence of cardiovascular disease.
GitHub
Version control is an industry best practice technique for monitoring changes to a file or group of files over time and reverting to a previous version. For this assignment, you are required to create a new PRIVATE GitHub repository to store the notebook and any support files. The assignment GitHub repository will contain:
-
README
-
Non-Conformance Report (if applicable)
-
Notebooks required for the assignment
-
Python scripts required for the assignment
-
Any other relevant documents
Evaluation
As an IS Professional, you are expected to meet the specification to the best of your ability. This specification is to be treated as the output of a meeting between yourself and a client. Your instructor will take on the role of the client. If you want to implement any functionality or behaviour not described in this specification, please seek approval from the client (your instructor) before you begin writing your program.
Your submission will be assessed to see if it correctly applies the behaviours mentioned in this document. This problem specification completely describes all behaviours to be tested. You have provided clear and detailed explanations or descriptions of how you analysed the task, approached problem-solving, and carried out the coding process.
You may only use programming constructs taught in the unit or demonstrated in the textbook. If you plan to use any advanced Python features not introduced in this unit, please seek approval from your instructor before you begin writing the program.
The code must follow the programming style naming conventions used in the PEP8, which include:
-
Meaningful names for projects, variables, methods, and controls.
-
Correct capitalisation of variables and methods
-
Appropriate use of comments
-
Reference any relevant forums, websites, or videos that you used.
-
Use of space and indentation to program is easy to read.
Submission Guidelines
Save your Google Colab notebook(s) as an .ipynb file and push it to your GitHub repository. Write a brief README.md file describing the assignment and the purpose of the repository. Your GitHub repo should be private and contain all documents relevant to this assignment.
Submit the link and zip file to your GitHub repository containing the notebook and README.md file.
This assignment is to be completed individually. The assignment is due 18:00 Friday 20th October 2023. The entire assignment GitHub project folder must be submitted as a single compressed archive file to the unit’s BlackBoard site submission link.
Non-Conformance Report (NCR)
A non-conformance report (NCR) is a document that addresses issues where there has been a deviation from the project specification or where work fails to meet agreed quality standards. If you cannot implement some functionality or have difficulty meeting any of the requirements, you will need to provide a NCR. An example might be unable to produce the plots, or deviation from the style guide. For each non-conformance issue, you need to document:
-
The problem
-
Severity and impact
-
How it occurred
-
How to prevent it from happening again
-
Plan or time estimate to fix
Grading Criteria
Your assignment will be graded based on the following criteria:
-
Clarity and organization of your code (comments, modular design, code reuse).
-
Proper use of version control with GitHub.
-
Quality and completeness of the business report (literate programming, clear explanations, and visualizations).
-
Effectiveness of the code testing notebook in identifying and resolving issues.
-
Overall data analysis quality, including insights and findings based on the Enron Mail dataset.
-
Critical thinking and problem solving skills
-
Academic Integrity
Curtin’s Academic Integrity policy must be followed in all submissions. For more details, go to the Academic Integrity tab in Blackboard or the Academic Integrity website. Both submissions must adhere to the Copyright Act of 1968 as well as the ‘Digital Agenda’ revisions to the Copyright Act.
Reviews
There are no reviews yet.