[Solved] CS8803 Module2

$25

File Name: CS8803_Module2.zip
File Size: 131.88 KB

SKU: [Solved] CS8803 Module2 Category: Tag:
5/5 - (1 vote)

In this assignment, youll begin the process of exploring relationships in data. Youll accomplish this task by computing some basic statistical measures on one of three datasets. This is a good time to learn or reboot your Python coding skills.

Step 1 Select one of the datasets for completion of this assignment:

  • [mental-health-in-tech-survey.csv] Mental Health in Tech Survey: Survey on Mental Health in the Tech Workplace in 2014 https://osmihelp.org/research/

Dependent Variables:

  • treatment: Have you sought treatment for a mental health condition? (Yes/No) o mental_health_consequence: Do you think that discussing a mental health issue with your employer would have negative consequences? (Yes/Maybe/No)
  • phys_health_consequence: Do you think that discussing a physical health issue with your employer would have negative consequences? (Yes/Maybe/No)
  • [diabetic_data.csv] Diabetes 130 US hospitals for years 1999-2008: Diabetes readmission https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008

Dependent Variables:

  • time_in_hospital: a numeric value representing number of days between admission and discharge
  • readmitted: Days to inpatient readmission <30 if the patient was readmitted in less than 30 days, >30 if the patient was readmitted in more than 30 days, and No for no record of readmission.
  • [compas-scores-two-years.csv] COMPAS Recidivism Racial Bias: Racial Bias in inmate COMPAS reoffense risk scores for Florida (ProPublica) https://github.com/propublica/compasanalysis

Dependent Variables: o decile_score: a numeric value between 1 and 10 corresponding to the recidivism risk score generated by COMPAS software (a small number corresponds to a low risk, a larger number corresponds to a high risk).

  • two_year_recid: a numeric indicator of whether the defendant recidivated two years after previous charge (0: no, did not recidivate, 1: yes, did recidivate)

Step 2 Explore the data by answering the following questions:

  • Which dataset did you select?
  • How many observations are in the dataset?
  • How many variables in the dataset?
  • Does this dataset seem to belong to a regulated domain in law as discussed in the lectures? If yes, which one?
  • How many variables in the dataset are associated with a legally recognized protected class? In a table format, list those variables associated with a protected class, identify the protected class and the associated legal precedence/law as discussed in the lectures.

Example Output (associated with a different dataset) Dataset: Housing Decisions in Metro-Atlanta

Number of Observations: 1,400

Number of Variables: 16

Regulated Domain in Law: Housing (Fair Housing Act)

Number of Protected Class Variables: 2

Protected Class Law
nationality National origin Civil Rights Act of 1964, 1991
pregnant (y/n) Pregnancy Pregnancy Discrimination Act

Step 3 Determine the relationships between dependent and independent variables

The frequency of a value represents the number of times a value occurs in a data set. Compute the frequency of each value associated with each dependent variable (listed in Step 1) as a function of all of the protected class variables (independent variables) identified in Step 2. Create histogram(s) comparing the frequency values of the dependent variable as a function of the independent variable. Hint: For variables that are continuous, you might consider creating intervals that represent the data. For categorical/ordinal/nominal values, you might consider converting to numerical values.

Example Output for One Dependent-Independent Variable Combination:

Independent Variable Protected Class Variable Dependent Variable Housing Decision (Y/N)
Pregnant Y Frequency of Y: 50 Frequency of N: 120
Pregnant N Frequency of Y: 130 Frequency of N: 20

Step 4 Show how to manipulate with data

Select one protected class variable (independent variable) and one dependent variable. 1) Create a graph to support the fairness hypothesis: The system is fair. There is no difference in the outcomes. 2) Create a graph to support the bias hypothesis: The system is biased. There is a difference in the outcomes. For each, provide a brief description of your manipulations.

Example Output:

  • Fair Hypothesis: As seen from this graph, housing decisions are not dependent on the pregnancy status of women. [Manipulations: Used line graph; Increased Scale to +-50; Mapped the ratio of positive Y decisions (i.e. 50/180 versus 130/180); No label on the Y-Axis].

Difference in Housing Decisions Based on Pregnancy

  • Bias Hypothesis: As seen from this graph, housing decisions are significantly dependent on the pregnancy status of women. [This hypothesis was easily supported with the data so didnt require much in manipulations: Used stacked bar graph; Reduced Scale; Reworded labels].

Step 5: Given your selected protected class variable (independent variable), calculate the average (mean, median, and mode) values of the protected class group (Hint: Variables might need to be converted to numerical values as needed). Run the random sampling method using 50% of the data to create a reduced dataset. Calculate the average (mean, median, and mode) values of the protected class group. Indicate if there is a difference (or not) between the original dataset and the reduced dataset for any of the averages. Provide all results.

Protected Class Variable (Pregnant) Mean Median Mode
Original Data Set 0 (NO) 0 (NO) 0 (NO)
Reduced Data Set 0 (NO) 1 (YES) 0 (NO)
Difference No Difference Difference No Difference

Step 6: Given your reduced dataset from Step 5, Repeat Step 3 (frequency and histogram) using your selected independent variable as a function of your selected dependent variable (from Step 4). Explain any differences (in no more than 2 sentences). If you used the random sampling method, would members associated with the protected class variable benefit or be harmed? Explain your reasoning (in no more than 2 sentences).

Step 7: Turn in a report documenting your outputs.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CS8803 Module2
$25