MATH 208 Final Exam December 8th-11th, 2020
Question 1 [50 points]
The data for this question comes from the STAR dataset from the AER library. Below is a summary and five sample
Copyright By Assignmentchef assignmentchef
rows of a modified version of that dataset containing information from a study examining the effect of reducing class
size on student performance in primary school.
str(STAR_data)
data.frame: 3114 obs. of 6 variables:
$ student_ID: int 1 2 3 4 5 6 7 8 9 10
$ stark : Factor w/ 3 levels regular,small,..: 2 2 1 2 1 1 2 2 1 3
$ star1 : Factor w/ 3 levels regular,small,..: 2 2 1 2 1 1 2 2 1 3
$ readk : int 447 450 448 447 431 451 478 455 430 437
$ read1 : int 507 579 651 533 558 548 514 530 490 503
$ read2 : int 568 588 614 608 608 596 569 608 622 552
STAR_data %>% slice(sample(1:n(), 5))
student_ID stark star1 readk read1 read2
1 1127 regular regular+aide 455 571 669
2 1556 regular+aide regular 456 483 560
3 856 regular regular+aide 450 512 571
4 611 regular regular 416 553 618
5 2296 regular+aide regular+aide 451 629 643
Besides the Student ID, we will focus on four other measures from the data: stark and star1, which indicate the
type of class in kindergarten and grade 1, respectively (regular, small, or regular+aide); and readk, read1,
and read2 which are reading scores from kindergarten, grade 1 and grade 2 respectively.
(a) [5 pts] Write a line of code that will generate the following tibble (or data.frame) with the total number of
students who were in each type of class in kindergarten:
# A tibble: 3 x 2
# Groups: stark [3]
1 regular 1067
2 small 987
3 regular+aide 1060
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(b) [5 pts] Write a line of code that will generate the following tibble (or data.frame) with the total number of
students who were in each combination of type of class in kindergarten and grade 1, as below:
count_table
# A tibble: 9 x 3
# Groups: stark, star1 [9]
stark star1 n
1 regular regular 518
2 regular small 85
3 regular regular+aide 464
4 small regular 29
5 small small 924
6 small regular+aide 34
7 regular+aide regular 491
8 regular+aide small 85
9 regular+aide regular+aide 484
(c) [5 pts] Assume the tibble from part (b) is called count_table as above. Now write a line of code that
produces a tibble which gives, for each class type in kindergarten, the proportion of students in each class type
in grade 1:
Here is some code which creates an object STAR_what.
STAR_what <- STAR_data %>%
pivot_longer(cols=readk:read2,names_to=Test,values_to=Score) %>%
select(-student_ID)
(d) [5 pts] What class of object is STAR_what?
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
In class we used xtabs to create contingency tables of counts of combinations of qualitative variables, as in this
STAR_who_denom <- xtabs(~star1+Test+stark,data=STAR_what)STAR_who_denom, , stark = regularstar1 read1 read2 readkregular 518 518 518small 85 85 85regular+aide 464 464 464, , stark = smallstar1 read1 read2 readkregular 29 29 29small 924 924 924regular+aide 34 34 34, , stark = regular+aidestar1 read1 read2 readkregular 491 491 491small 85 85 85regular+aide 484 484 484(e) [5 pts] What will the code STAR_who_num[1,3,2] return as output?CONTINUED ON NEXT PAGEMATH 208 Final Exam December 8th-11th, 2020xtabs can also be used to sum up values of another variable for different combinations of star1, Test and starkby putting the variable name in front of the ~. For example, we can find the total of all scores by usingSTAR_who_num <- xtabs(Score~star1+Test+stark,data=STAR_what)STAR_who_num, , stark = regularstar1 read1 read2 readkregular 273728 306238 228798small 45797 50785 37660regular+aide 249580 276710 205622, , stark = smallstar1 read1 read2 readkregular 15396 17009 12617small 500773 552478 413608regular+aide 18338 20488 14927, , stark = regular+aidestar1 read1 read2 readkregular 261220 290488 218272small 44596 49270 37070regular+aide 258514 286343 212980CONTINUED ON NEXT PAGEMATH 208 Final Exam December 8th-11th, 2020(f) [5 pts] Using STAR_who_num and STAR_who_denom, write a single line of code that assigns the average scorefor each star1 by Test by stark combination to an object called STAR_avg as seen below:, , stark = regularstar1 read1 read2 readkregular 528.4324 591.1931 441.6950small 538.7882 597.4706 443.0588regular+aide 537.8879 596.3578 443.1509, , stark = smallstar1 read1 read2 readkregular 530.8966 586.5172 435.0690small 541.9621 597.9199 447.6277regular+aide 539.3529 602.5882 439.0294, , stark = regular+aidestar1 read1 read2 readkregular 532.0163 591.6253 444.5458small 524.6588 579.6471 436.1176regular+aide 534.1198 591.6178 440.0413(g) [10 pts] Write a line of code that creates an array that contains the difference between the average read2 andreadk scores for each stark by star1 combination using STAR_avg above.star1 regular small regular+aideregular 149.4981 151.4483 147.0794small 154.4118 150.2922 143.5294regular+aide 153.2069 163.5588 151.5764CONTINUED ON NEXT PAGEMATH 208 Final Exam December 8th-11th, 2020(h) [10 pts] Write code (possibly multiple lines) using the original STAR_what to produce a tibble containing thesame rows and columns as the object in part (g).# A tibble: 3 x 4# Groups: star1 [3]star1 regular small `regular+aide`
1 regular 149. 151. 147.
2 small 154. 150. 144.
3 regular+aide 153. 164. 152.
END OF QUESTION 1
Question 2 [50 points]
We will re-use the same data that was used in Question 1. The description is repeated below for your convenience.
The data for this question comes from the STAR dataset from the AER library. Below is a summary and five sample
rows of a modified version of that dataset containing information from a study examining the effect of reducing class
size on student performance in primary school. T
str(STAR_data)
data.frame: 3114 obs. of 6 variables:
$ student_ID: int 1 2 3 4 5 6 7 8 9 10
$ stark : Factor w/ 3 levels regular,small,..: 2 2 1 2 1 1 2 2 1 3
$ star1 : Factor w/ 3 levels regular,small,..: 2 2 1 2 1 1 2 2 1 3
$ readk : int 447 450 448 447 431 451 478 455 430 437
$ read1 : int 507 579 651 533 558 548 514 530 490 503
$ read2 : int 568 588 614 608 608 596 569 608 622 552
STAR_data %>% slice(sample(1:n(), 5))
student_ID stark star1 readk read1 read2
1 2159 regular regular 465 564 622
2 2171 regular regular+aide 410 494 586
3 187 regular regular+aide 436 521 566
4 1320 small small 443 558 659
5 1946 regular+aide regular 545 519 584
Besides the Student ID, we will focus on four other measures from the data: stark and star1, which indicate the
MATH 208 Final Exam December 8th-11th, 2020
type of class in kindergarten and grade 1, respectively (regular, small, or regular+aide); and readk,read1,
and read2 which are reading scores from kindergarten, grade 1 and grade 2 respectively.
(a) [6 pts] Below are partially obscured code and two plots of the values of class types for kindergarten and grade
p1<-ggplot(STAR_data,aes(x=star1,fill=stark)) + geom_YYYYYYY() +scale_fill_viridis_d() + ggtitle(“Plot 1”) + theme_bw()p2<-ggplot(STAR_data) + geom_XXXXXXX(aes(x=product(stark,star1),fill=stark))+scale_fill_viridis_d() + ggtitle(“Plot 2”)+ theme_bw()grid.arrange(grobs=list(p1,p2),nrow=2,ncol=1)CONTINUED ON NEXT PAGEregular small regular+aideregular+aideregular+aideregular small regular+aideregular+aideIdentify these two plots by name:Plot 1 Plot 2(b) [8 pts] Using these plots, describe the describe the association between stark and star1. In particular, whatdoes knowing the type of grade 1 class type tell us about the possible kindergartn class type for the studentsin this sample?MATH 208 Final Exam December 8th-11th, 2020CONTINUED ON NEXT PAGEMATH 208 Final Exam December 8th-11th, 2020(c) [6 pts] Although these plots look similar, they are in fact different. There are two important differences in howthese plots were constructed, one which is more obvious than the other. Explain what those two differences(d) [6 pts] Write a line of code to create new factor variables in STAR_data for stark and star1 named stark_modand star1_mod which combine the regular and regular+aide levels into a single level not small.Below is a figure along with the code (partially obscured) which generated it.not small smallMATH 208 Final Exam December 8th-11th, 2020ggplot(STAR_data,aes(x=_______,fill=________,y=read2)) +geom_______() + ggtitle(“Plot e”) + theme_bw()MATH 208 Final Exam December 8th-11th, 2020(e) [4 pts] What are the missing geometry and aesthetics that generated the figure on the previous page (that is,what are the words that are missing in the code above for Plot e)?(f) [5 pts] Based on these plots, do you think there is evidence of an association between the modified type ofclass variables and the grade 1 reading test score? Explain your answer in 3 sentences or fewer.CONTINUED ON NEXT PAGEMATH 208 Final Exam December 8th-11th, 2020Below is a plot of the reading test scores for kindergarten and grade 1 for the STAR_data by levels of the modifiedkindergarten class type.350 400 450 500 550 600350 400 450 500 550 600(g) [4 pts] Identify the two kinds of plots in Panel g1 and g2 by name (note that there are two of the same kindof plot in each panel) Panel g1: Panel g2:(h) [6 pts] From Panels g1 and g2, would you conclude that there is an association between readk and read1 ineither group? Does the association between the two reading test varies seem to vary by levels of the modifiedkindergarten class type variable? Explain your answers in 4 sentences or fewer.CONTINUED ON NEXT PAGEMATH 208 Final Exam December 8th-11th, 2020(i) [5 pts] Which of the following plots could also be used to assess the association between reading scores inkindergarten and grade 1 (assuming that neither variable is transformed)? Circle all that apply.A. Line chart B. 2-d density plot C. Treemap D. 2-d histogramEND OF QUESTIONMATH 208 Final Exam December 8th-11th, 2020Question 3 [50 points]The goal of this task is to write functions to identify certain repeated patterns of characters in long character vectors,a basic form of a more complicated task that is often used in gene sequencing.For every part of this question, you will assume that the user gives you a vector where each element of the vectorcontains a single character,lower-case letter. For example, the user may specify:c(“b”, “c”, “b”, “d”, “c”, “a”, “b”, “b”, “d”, “c”)(a) [15 pts] Write a function below using a for loop (and possibly other control statements) which takes acharacter vector as an argument and returns the length of the longest sequence of repeated letter b for anarbitrary vector. For the example vector above, for example, the length of the longest sequence of repeatedb values is 2. It does not matter if the longest sequence length occurs multiple times, you only need toreport it once.CONTINUED ON NEXT PAGEMATH 208 Final Exam December 8th-11th, 2020(b) [15 pts] Now assume that if the user inputs a vector that includes a certain stopping character, then youshould immediately stop analyzing the sequence and return a value of NA. If the input vector does not includethe stopping character, then it proceeds as in part (a) to return the length of the longest sequence of repeatedletter b values. For example, if the stopping character is a, then in the example above, your functionshould return NA. But if the stopping character is f, then in the example above should return 2 as before.Modify your function from part (a) to complete this task. Your function should take two arguments: the inputcharacter vector and a stopping character whose default value is f.CONTINUED ON NEXT PAGEMATH 208 Final Exam December 8th-11th, 2020(c) [10 pts] Now assume that you want write code to create a data frame or tibble that contains the longest runin the vector for each letter of the alphabet, except for the single special stopping character specified by theuser. If a non-stopping letter does NOT appear in the vector, it should not appear in the table. In otherwords, if the stopping character is f, then applying your code to the example vector above would return.# A tibble: 4 x 2letter longest
But if the stopping character is a, then your function should return NA for all letters, i.e.
# A tibble: 4 x 2
letter longest
Write code below that uses your function from part (b) to produce the desired result. You do not need to write a
separate function for this part, but you can if you think it is helpful.
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(d) [10 pts] Finally, using your code from part (c) so that you can obtain a list with 26 elements, where you
obtain the tibble in part (c) for a each of the 26 possible stopping characters. You do not need to write a
separate function for this part, but you can if you think it is helpful.
END OF QUESTION
MATH 208 Final Exam December 8th-11th, 2020
Question 4 [30 points]
In this question, you will write code to simulate a board game based on the fable, The Tortoise and the Hare.
The idea of the game is as follows:
(a) There are 100 spaces on the board and each piece must travel in order through the board.
(b) Both characters start on space 0.
(c) The Hare always gets to move first. The Hare randomly moves forward 5 spaces (when running) or moves
forward 0 spaces (when sleeping), with equal probability.
(d) Then the Tortoise moves forward either 2 spaces or 4 spaces, with equal probability.
(e) The game ends when one of the characters reaches a total of 100 spaces or greater.
[10 points] Write a function below, one_turn, which simulates a single turn in the game, i.e. steps (c) and (d)
above. The function should take two arguments, the current space of the Hare and the updated space of the tortoise.
The function should return the updated space of the the Hare and the upated space of the Tortoise after one turn.
Hint: You can use the sample function in R to choose the number of spaces each player moves forward.
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
[20 points] Write a new function which uses your function in part (a) to simulate one entire game, from steps a)
to e) above. Your function should take in one argument: a random seed so that you can replicate the results of the
game. Your function should return a list containing two elements: the name of the winner of the game (i.e. Hare
or Tortoise) and a tibble containing the history of all spaces travelled by both players .
Question 1 [50 points]
Question 2 [50 points]
Question 3 [50 points]
Question 4 [30 points]
CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.