[SOLVED] R algorithm html General Guidelines

$25

File Name: R_algorithm_html_General_Guidelines.zip
File Size: 329.7 KB

5/5 - (1 vote)

General Guidelines
Homework 3
Stats 20 Lec 1 and 2 Fall 2019
Please use R Markdown for your submission. Include the following files:
Your .Rmd file
The compiled/knitted HTML or PDF document (do not knit to Word).
The knitted document should be clear, be well-formatted, and contain all relevant R code, output, and explanations.
Please read and review the Bowdoin Computer Science Department collaboration policy: https://turing.bowdoin.edu/dept/collab.php
Collaboration on this homework must adhere to the Level 1 collaboration policy described at the above link.
EVERYTHING you submit MUST be 100% your original work product. Any student suspected of plagiarizing, in whole or in part, any portion of this assignment, will be immediately referred to the Dean of Students office without warning.
Question 1
Suppose Andy Dwyer tracks his commute time to his womens studies class for ten days and records the following times (in minutes):
17 16 20 24 22 15 21 15 17 22
(a) On which days did Andy have a commute time that was more than one standard deviation away (longer or shorter) from the average (mean) commute time? What were those commute times?
(b) On which days did Andy have a commute time that within one standard deviation (longer or shorter) of the mean commute time? What were those commute times?
(c) What proportion of days did Andy have a commute time that was within one standard deviation of the mean commute time?
Hint: Can arithmetic operators/functions for numeric vectors work for logical vectors? What do sum() and mean() compute for logical vectors?
Question 2
Using seq() and rep() as needed, create the following vectors in R.
(a) 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4
(b) 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
(c) 1 2 3 4 5 6 2 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10
(d) Write two functions, each with two arguments x and n, that will extend the patterns in (a) and (b). In particular, the argument x should specify the maximum number in the sequence, and n should specify the number of times each number is repeated. Verify that your functions work to generate the sequences in (a) and (b).
1

The following code is used in Questions 3 and 4.
Consider the following code:
## [1] 1 1 0 4 0 3
## [1] 1 1 0 4 0 3
## [1] TRUETRUEFALSE 4 0 3
Question 3
(a) Explain why mixed_2 and mixed_3 do not return the same vector.
(b) Use the same input values of TRUE, TRUE, FALSE, 4, 0, 3 with the c() function to create the
following vector:
## [1] TRUE 10403
Question 4
Type Casting: The as.logical(), as.numeric(), and as.character() functions allow us to coerce (or cast) a vector into one of a different mode.
For example:
as.logical(c(TRUE,FALSE,TRUE)) ## [1] TRUE FALSE TRUE
as.numeric(c(4,0,3)) ## [1] 4 0 3
as.character(c(TRUE,FALSE,TRUE))
## [1] TRUE FALSE TRUE
(a) Explain why as.numeric(mixed_2) and as.numeric(mixed_3) produce different results.
(b) Explain why as.logical(mixed_2) and as.logical(mixed_3) produce different results.
(c) Use type casting functions to coerce mixed_2 into a meaningful logical vector (i.e., with no NA values).
mixed_1 <- c(TRUE,TRUE,FALSE,4,0,3) mixed_1 mixed_2 <- c(c(TRUE,TRUE,FALSE,4,0),”3″) mixed_2 mixed_3 <- c(TRUE,TRUE,FALSE,4,0,”3″) mixed_3 2Question 5This question highlights why vectorized operations are preferred over for() loops whenever possible.The system.time() function measures the time it takes your computer to evaluate expressions. The input can be any R command. To input multiple commands, enclose the commands in curly braces {}. Similar to the behavior of loops, the input commands will be executed but not printed unless called within print().For example, consider the following commands: # How much time does it take to make a sequence of 10 million entries?system.time(x <- seq(1,1e8,by=10))## #### ## usersystem elapsed0.452 0.113 0.677 usersystem elapsed0.472 0.160 0.709 # How much time does it take to make two sequences of 10 million entries?system.time({x <- seq(1,1e8,by=10) y <- seq(1e8,1,by=-10)})The user time is the time dedicated to executing the command, the system time is the time your system spent doing other tasks, and the elapsed time is the actual elapsed time (e.g., if we were timing with a clock). The times are shown in seconds.For the components of this question, execute the following commands:(a) Repeated Vector Allocation: Create a storage vector Z of length 0. Write a for() loop such that the ith iteration of the loop executes the following steps:(1) Compute the sum of the ith entry of X with the ith entry of Y.(2) Append the sum from (1) to the end of the current vector Z and save the result as Z. Use system.time() to measure how long the for() loop takes to execute.(b) Repeated Vector Assignment: Create a storage vector Z of length 1e4. Write a for() loop such that the ith iteration of the loop executes the following steps:(1) Compute the sum of the ith entry of X with the ith entry of Y.(2) Assign the sum from (1) to the ith entry of Z.Use system.time() to measure how long the for() loop takes to execute.(c) Vectorization: Use vectorization (not a loop) to compute the sums of the corresponding entries of X and Y and save the sums to a vector Z. Use system.time() to measure how long the sums take to execute. Compare the computation times (elapsed) between the three approaches.Note: If all computation times print to 0 (e.g., if you have a fast computer), extend the lengths of X and Y by factors of 10 until there are meaningful comparisons between (a), (b), and (c). X <- rnorm(1e4) Y <- rnorm(1e4)3Question 6Consider the while() loop below that computes all Fibonacci numbers less than 500. # fib1 and fib2 will represent the two latest terms in the sequence. fib1 <- 1 # Initialize fib1fib2 <- 1 # Initialize fib2# Create the vector to store the output from the while loop. full_fib <- c(fib1,fib2)# While the sum of the last two terms is less than 500, execute the following commands.while(fib1 + fib2 < 500){# Save the latest term to old_fib2.old_fib2 <- fib2# Compute the sum of the latest two terms and assign the sum to be the new latest term. fib2 <- fib1 + fib2# Append the latest term to the end of the full_fib vector with all previous terms. full_fib <- c(full_fib,fib2)# Save the previously latest term (now the second to last term) to fib1.fib1 <- old_fib2}# Print the output from the while loop.full_fib##[1] 1 1 2 3 5 81321345589 144 233 377(a) The variable old_fib2 is not actually necessary. Rewrite the while() loop with the update of fib1based on just the current values of fib1 and fib2.(b) In fact, fib1 and fib2 are not necessary either. Rewrite the while() loop without using any variablesexcept full_fib.(c) Determine the number of Fibonacci numbers less than 5000000.Question 7Write a function that inputs a numeric vector and implements the sorting algorithm described below. For every index position of a vector starting with position 1 until the second to last index position: Find the least valued element to the right of the index position. If the least valued element to the right of the index position you are working on is less than the valuein the index position, switch the positions of the least valued element to the right of the index position and the element in the index position.The body of the function cannot use any function related to which(), min(), max(), sort(), order(), etc. The output of the function should be the sorted values from the input vector. Verify your function works on Andys commute times from Question 1.4Question 8Download the DNA.RData file from CCLE and save it to your current working directory. Then run the command load(“DNA.RData”) to create the DNA vector in your workspace. The DNA vector represents a nucleotide sequence of DNA (deoxyribonucleic acid). The letters A, C, G, and T respectively represent the four nucleotide bases of a DNA strand: adenine, cytosine, guanine, and thymine.We are interested in finding the sequence string “G”,”A”,”T”,”T”,”A”,”C”,”A” in the DNA vector.Note: Only functions or syntax discussed in the lecture notes may be used. No credit will be given for use ofregex or other outside functions.(a) Write a loop that finds the sequence string in the DNA vector. How many iterations of the loop were executed?Hint: An ideal loop will iterate fewer times than the starting index of the desired sequence string.(b) What are the indices, if any, for where the sequence string occurs in the DNA vector? Verify your resultsby extracting the sequence string from DNA.5

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] R algorithm html General Guidelines
$25