Explanation: This assignment is intended to give you more practice delving into the Add Health data set and in manipulating additional variables.
Instructions:
- Make sure your Rmd file has no local file system dependencies (i.e., anyone should be able to recreate the output HTML using only the Rmd source file).
- Make a copy of this Rmd file and add answers below each question. The code that generated the answers should be included, as well as the complete source code for the document.
- Change the YAML header above to identify yourself and include contact information.
- For any tables or figures, include captions and cross-references and any other document automation methods as necessary.
- Make sure your output HTML file looks appealing to the reader.
- Upload the final Rmd to your github repository.
- Download assn_id.txt and include the URL to your Rmd file on github.com.
- Create a zip file from your copy of assn_id.txt and upload the zip file to the Canvas site for Assignment 9. The zip file should contain only the text file. Do not include any additional files in the zip fileeverything should be able to run from the file you uploaded to github.com. Please use zip format and not 7z or any other compression/archive format.
1
Using the full household roster (youll need to go back the full raw data source, 21600-0001-Data.dta), create the following variables for each respondent. Document any decisions that you make regarding missing values, definitions, etc. in your narrative as well as in the R code. Include a frequency tabulation and a histogram of each result.
Starting by pulling in the full dataset from GitHub and listing the variables.
add_helth <- haven::read_dta(https://github.com/dmccoomes/csde502_winter_2021_dcoomes/raw/main/Homework/homework_09/data/21600-0001-Data.dta) metadata <- bind_cols( # variable name varname = colnames(add_helth), # label varlabel = lapply(add_helth, function(x) attributes(x)$label) %>% unlist(), # format varformat = lapply(add_helth, function(x) attributes(x)$format.stata) %>% unlist(), # values varvalues = lapply(add_helth, function(x) attributes(x)$labels) %>% # names the variable label vector lapply(., function(x) names(x)) %>% # as character as.character() %>% # remove the c() construction str_remove_all(^c\(|\)$)) DT::datatable(metadata)
1.1
Total number in household
I will use the question How many people live in household? to construct the total number in the household. I will not include any observations that reported they dont live in a regular household. As we can see from Table 1 and Figure 1 more households have 4 members as compared to other numbers, and the distribution of those that answered is right-skewed.
add_helth %<>% mutate(num_house=S27) %>% mutate(num_house=ifelse(num_house==7|num_house==99, NA, num_house))add_helth %>% group_by(num_house) %>% summarize(n=n()) %>% mutate(`%`=n/sum(n)*100) %>% mutate(`%`=`%` %>% round(1)) %>% mutate(cum %= round(cumsum(n/sum(n)*100), 1)) %>% kable(caption=Total number of individuals living in household) %>% kable_styling(full_width=FALSE, position=left, bootstrap_options = c(striped, hover))
Table 1.1: Total number of individuals living in household | |||
num_house | n | % | cum % |
1 | 24 | 0.4 | 0.4 |
2 | 239 | 3.7 | 4.0 |
3 | 853 | 13.1 | 17.2 |
4 | 1564 | 24.0 | 41.2 |
5 | 1095 | 16.8 | 58.0 |
6 | 822 | 12.6 | 70.7 |
NA | 1907 | 29.3 | 100.0 |
bins <- length(unique(add_helth$num_house))-1 ggplot(data=add_helth, mapping=aes(x=num_house)) + geom_histogram(bins=bins, color=red, fill=white) + theme_bw() + labs(x=Number of people per household, y=Count)
Figure 1.1: Histogram of the number of people per household
1.2
Number of sisters
1.3
Number of brothers
1.4
Total number of siblings
2
What proportion of students live with two biological parents? Include the analysis in your R code.
3
Calculate the number of household members that are NOT biological mother, biological father, full brother or full sister. Create a contingency table and histogram for this variable.
3.1 Source code
cat(readLines(con = dcoomes_hw_09.Rmd), sep =
)title: CSDE 502 Winter 2021, Assignment 8author: [dcoomes](mailto:[email protected])output: bookdown::html_document2: number_sections: true self_contained: true code_folding: hide toc: true toc_float: collapsed: true smooth_scroll: false pdf_document: number_sections: true toc: true fig_cap: yes keep_tex: yesurlcolor: blue `{r, warning=FALSE, message=FALSE} knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message = FALSE) library(captioner)library(tidyverse)library(magrittr)library(kableExtra) figure_nums <- captioner(prefix = Figure)table_nums <- captioner(prefix = Table)` ___Explanation___:This assignment is intended to give you more practice delving into the Add Health data set and in manipulating additional variables. ___Instructions___: 1. Make sure your Rmd file has no local file system dependencies (i.e., anyone should be able to recreate the output HTML using only the Rmd source file).1. Make a copy of this Rmd file and add answers below each question. The code that generated the answers should be included, as well as the complete source code for the document.1. Change the YAML header above to identify yourself and include contact information.1. For any tables or figures, include captions and cross-references and any other document automation methods as necessary.1. Make sure your output HTML file looks appealing to the reader.1. Upload the final Rmd to your github repository.1. Download [`assn_id.txt`](http://staff.washington.edu/phurvitz/csde502_winter_2021/assignments/assn_id.txt) and include the URL to your Rmd file on github.com.1. Create a zip file from your copy of `assn_id.txt` and upload the zip file to the Canvas site for Assignment 9. ___The zip file should contain only the text file. Do not include any additional files in the zip fileeverything should be able to run from the file you uploaded to github.com. Please use zip format and not 7z or any other compression/archive format.___ #__Using the full household roster (youll need to go back the full raw data source, [21600-0001-Data.dta](http://staff.washington.edu/phurvitz/csde502_winter_2021/data/21600-0001-Data.dta.zip)), create the following variables for each respondent. Document any decisions that you make regarding missing values, definitions, etc. in your narrative as well as in the R code. Include a frequency tabulation and a histogram of each result.__ Starting by pulling in the full dataset from GitHub and listing the variables. `{r, cache=TRUE, results=hide} add_helth <- haven::read_dta(https://github.com/dmccoomes/csde502_winter_2021_dcoomes/raw/main/Homework/homework_09/data/21600-0001-Data.dta) metadata <- bind_cols( # variable name varname = colnames(add_helth), # label varlabel = lapply(add_helth, function(x) attributes(x)$label) %>% unlist(), # format varformat = lapply(add_helth, function(x) attributes(x)$format.stata) %>% unlist(), # values varvalues = lapply(add_helth, function(x) attributes(x)$labels) %>% # names the variable label vector lapply(., function(x) names(x)) %>% # as character as.character() %>% # remove the c() construction str_remove_all(^c\(|\)$)) DT::datatable(metadata) ` ##__Total number in household__ I will use the question How many people live in household? to construct the total number in the household. I will not include any observations that reported they dont live in a regular household. As we can see from **`r table_nums(name=numtable, display=cite)`** and **`r figure_nums(name=numhist, display=cite)`** more households have 4 members as compared to other numbers, and the distribution of those that answered is right-skewed. `{r} add_helth %<>% mutate(num_house=S27) %>% mutate(num_house=ifelse(num_house==7|num_house==99, NA, num_house)) ` `{r numtable} add_helth %>% group_by(num_house) %>% summarize(n=n()) %>% mutate(`%`=n/sum(n)*100) %>% mutate(`%`=`%` %>% round(1)) %>% mutate(cum %= round(cumsum(n/sum(n)*100), 1)) %>% kable(caption=Total number of individuals living in household) %>% kable_styling(full_width=FALSE, position=left, bootstrap_options = c(striped, hover)) ` `{r numhist, fig.cap=Histogram of the number of people per household} bins <- length(unique(add_helth$num_house))-1 ggplot(data=add_helth, mapping=aes(x=num_house)) + geom_histogram(bins=bins, color=red, fill=white) + theme_bw() + labs(x=Number of people per household, y=Count) ` ##__Number of sisters__ ##__Number of brothers__ ##__Total number of siblings__ #__What proportion of students live with two biological parents? Include the analysis in your R code.__ #__Calculate the number of household members that are NOT biological mother, biological father, full brother or full sister. Create a contingency table and histogram for this variable.__ ## Source code`{r comment=}cat(readLines(con = dcoomes_hw_09.Rmd), sep =
)`
Reviews
There are no reviews yet.