5/5 - (1 vote)

Instructions:

Fill in your name and UWNetID above.
Put answers to the questions on this document, using the 00Answers Word style so your answers are clearly distinguished from the questions.
Create a PDF file from this document.
Create a single zip file including this document as a PDF file, along with the RDS file and R code file.
Upload the single zip file to Canvas.

Explanation:

For this assignment, you will be perusing some of the documentation for the Add Health Wave 1 data set. You will use the documentation to make some updates to a data frame containing some of the Add Health data, and then save the data frame as an RDS file. You will update a metadata table that partially describes the data set and changes you made to the variable names and variable labels.

To open a Stata version 13 file in R there are two main options:

Use haven::read_dta(). To access variable labels in R use labelled::foreign_to_labelled(). To update variable labels, use the labelled::var_label()
Use readstata13::read.dta13(). Variable labels for this format are available, e.g., for a data frame named dat as attributes(dat)$var.labels. This is a vector of text strings that can be updated by assigning a new value to the specified element, e.g.,attributes(dat)$var.labels[1] <- foo.

To save the RDS file, use the base function saveRDS().

Here is a base R code snippet that will rename a single variable:

colnames(data_frame)[grep(^original_variable_name$, colnames(data_frame))] <- new_variable_name

The grep() function finds the position of the named variable in the list of variables in the data frame. The characters ^ and $ are regular expressions to specify the start and end of the string to be matched (assuring that the pattern does not match multiple similar variable names).

It is much simpler with tidyverse and magrittr:

data_frame %<>% rename(new_variable_name = old_variable_name)

Additional hint for dealing with PDF documentation:

Use pdfgrep (should be available in a Linux or Mac package manager; for Windows, search for a version or use Cygwin).
Use the R pdftools This could be used in a loop over each PDF file to create a data frame with the name of the PDF file, page number, and text of each page. The str_match() function could be used to identify the file name and page number where specific text strings occur. For a minimal example, this shows that the string h1gi1m is found on page 1 of INH01PUB.PDF. Conversion of the PDF files text to lowercase simplifies the matching:

> x <- pdftools::pdf_text(pdf = INH01PUB.PDF)

> str_match(string = x %>% str_to_lower(), pattern = h1gi1m)

[,1]

[1,] h1gi1m

[2,] NA

[3,] NA

[4,] NA

[5,] NA

[6,] NA

[7,] NA

[8,] NA

[9,] NA

[10,] NA

[11,] NA

[12,] NA

[13,] NA

[14,] NA

[15,] NA

Questions:

Explore the Add Health website (http://www.cpc.unc.edu/projects/addhealth) and answer the following questions (making sure to cite as necessary):
- What was the sampling frame for this study?

The sampling frame for the Add Health study was all high schools included in the Quality Education Database (QED). High school was defined as schools with an 11^th grade and more than 30 students.

What were the three kinds of respondents at Wave I?

What was the instrument with the largest sample size?

Is it possible for a respondent to be in Wave III without being in Wave II?

What is the time span of the Add Health data collection (all waves)?

What is the difference between the public and the restricted-use Add Health data?

Describe a research question that you might be able to answer using the Add Health dataset.

Download the public-use Add Health documentation at https://canvas.uw.edu/courses/1434040/files. Answer the following questions:

In what pdf document is the documentation for the race items for the Wave I In-Home questionnaire?

How many respondents were of Hispanic/Latino origin?

What is the Knowledge Quiz in the Wave I In-Home questionnaire?

What is the unique identifier for the In-home data?

Download the Stata 13 format file AHwave1_v1.dta (http://staff.washington.edu/phurvitz/csde502_winter_2021/data/AHwave1_v1.dta).

Fill in the grey missing cells in Table 1 below based on the data and/or documentation. Optimally, use the documentation to familiarize yourself with the structure of the code books.
Using questions 6 and 8 in INH01PUB.PDF, create a new variable named race that uses recoded values (white = 1; black/African American = 2; American Indian = 3; Asian/Pacific Islander = 4; other = 5; unknown/missing = 9).
Rename the variables, and update variable labels using Table 1 as a guide and save the data frame as the file as rds. Use a single R code file for your edits to the data file.
Update the status in Table 1 as needed.

Table 1: Codebook for variables from Add Health Wave 1 data

newvariablename	originalvariablename	status*	datatype	values	newvariablelabel	codebookfilename
aid	aid	unchanged	text	8 digit string	unique case (student) identifier	SECTAPUB.PDF
imonth	imonth	unchanged	integer	14 to 12	month interview completed	SECTAPUB.PDF
iday	iday	unchanged			day interview completed	SECTAPUB.PDF
iyear	iyear	unchanged		94, 95		SECTAPUB.PDF
bio_sex	bio_sex				interviewer confirmed sex
bmonth	h1gi1m				birth month	INH01PUB.PDF
byear	h1gi1y				birth year
hispanic	h1gi4	renamed			Hispanic/Latino	INH01PUB.PDF
white	h1gi6a	renamed		0 = not marked1 = marked6 = refused8 = dont know	race white	INH01PUB.PDF
black					race black or African American	INH01PUB.PDF
AI	h1gi6c				race American Indian or Native American	INH01PUB.PDF
asian	h1gi6d				race Asian or Pacific Islander	INH01PUB.PDF
raceother	h1gi6e				race other	INH01PUB.PDF
onerace					one category best describes racial background	INH01PUB.PDF
observedrace	h1gi9				interviewer observed race	INH01PUB.PDF
health	h1gh1				how is your health
race	not applicable				race recoded as white; black/African American; American Indian; Asian/Pacific Islander; other; unknown/missing

*status categories: unchanged, renamed, missing defined, derived

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSDE502 Homework6

Reviews

Whatsapp Us

[Solved] CSDE502 Homework6

Reviews

Related products

[Solved] CSDE502 Homework4

[Solved] CSDE502 Homework8

[Solved] CSDE502 Homework7

[Solved] CSDE502 Homework10

[Solved] CSDE502 Homework3

[Solved] CSDE502 Homework9