STA 141A
Fall 2019 Homework 1
Due: October 14
Submit the assignment electronically in Canvas. You are encouraged to post questions about the homework in Piazza. The electronic submission must be a pdf document containing your answers AND the R code you have developed.
Honor Code: The codes and results derived by using these codes constitute my own work. I have consulted the following resources regarding this assignment: (ADD: names of persons or web resources, if any, excluding the instructor, TAs, and materials posted on course website)
1. For each of the following case, describe the best possible data structure (e.g., array, data frame, list, table etc.) for representing the data. Also, write appropriate R codes to answer the questions that follow, treating the data as if it is given.
(a) Data: A study of health effects of air quality in 10 major cities of the world involves daily measurements on the four variables: average temperature (temp), total pre- cipitation (precip), maximum PM10 concentration (PM10) and number of deaths among elderly population (death). Measurements are available for five years.
(i) What is the average number of deaths for each of the cities on days where the PM10 concentration is greater than 20 ?
(ii) What is the average PM10 concentration for each of the cities on days with no precipitation and average temperature above 80 degrees F ?
(b) Data: The data consist of records of patients visit to a clinic. The measurements for each patient are: date of visit (visit), age of patient in years (age), gender with values M or F (gender), weight in lb (weight), systolic blood pressure (BP.sys), diastolic blood pressure (BP.dia), blood glucose level in mg/dl (glucose). For blood pressure levels, the unit is standard and the value is numeric with range between 0 and 600.
(i) How many times did each patient visit the clinic ?
(ii) What is the average systolic blood pressure level for each of the patients with maximum weight (during the study period) greater than 180 lb ?
(iii) What is the average blood glucose level for each of the patients with age at least 40 years at the first visit ?
1
2. Suppose you have
four types of animals: cat, dog, cow, squirrel;
four possible colors: white, black, brown, red;
five possible attribute: big, small, angry, cute, finicky.
(a) Generate random samples, with replacement, of size 100 from each of the types. Call the resulting vectors of character strings as: Animal, Color, Attribute.
(b) Write an R code to combine the results to produce phrases (character strings) de- scribing the animals, as in this example: big white dog.
(c) Create a frequency distribution (or contingency table) of the different types of ani- mals together with colors and attributes based on the sampled data.
(d) Use the result in part (c) to obtain the frequency distribution of: (i) Animal vs. Color; (ii) Animal vs. Attribute; (iii) Animal.
3. Give an informative graphical statistical summary of the following datasets (available with base R). In each case, write very brief (maximum of 100 words) description high- lighting the findings. You may use up to 2 plots for illustrating the features of each data set.
(a) AirPassengers : Monthly airline passenger numbers during 19491960.
(b) EuStockMarkets : Daily closing prices of major European stock indices during
19911998.
(c) trees : Girth, weight and volume for Black Cherry trees.
2
Reviews
There are no reviews yet.