Lab5_UNI
Part 1 (Iris)
Background
The R data description follows:
This famous (Fishers or Andersons) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
Task
1) Using ggplot, as apposed to Base R, produce the same plot constructed by the following code. That is, plot Petal Length versus Sepal Length split by Species. The colors of the points should be split according to Species. Also overlay three regression lines on the plot, one for each Species level. Make sure to include an appropriate legend and labels to the plot. Note: The function coef() extracts the intercept and the slope of an estimated line.
# Base plotplot(iris$Sepal.Length,iris$Petal.Length,col=iris$Species,xlab=Sepal,ylab=Petal,main=# loop to construct each LOBFfor (i in 1:length(levels(iris$Species))) { extract <- iris$Species==levels(iris$Species)[i]abline(lm(iris$Petal.Length[extract]~iris$Sepal.Length[extract]),col=i) }# Legendlegend(right,legend=levels(iris$Species),fill = 1:length(levels(iris$Species)), cex = |
Gabriels Plot
.75)
# Add points and text
points(iris$Sepal.Length[15],iris$Petal.Length[15], pch = *, col = black) text(iris$Sepal.Length[15]+.4,iris$Petal.Length[15],(5.8,1.2),col=black) points(iris$Sepal.Length[99],iris$Petal.Length[99], pch = *, col = red) text(iris$Sepal.Length[99]+.35,iris$Petal.Length[99],(5.1,3),col = red) points(iris$Sepal.Length[107],iris$Petal.Length[107],pch = *, col = green) text(iris$Sepal.Length[107],iris$Petal.Length[107]+.35,(4.9,4.5),col = green)
Gabriels Plot
Sepal
Solution goes below:
library(ggplot2) ## Plot.
Part 2 (Worlds Richest)
Background
We consider a data set containing information about the worlds richest people. The data set us taken form the World Top Incomes Database (WTID) hosted by the Paris School of Economics [http://top-incomes.gmond.parisschoolofeconomics.eu]. This is derived from income tax reports, and compiles information about the very highest incomes in various countries over time, trying as hard as possible to produce numbers that are comparable across time and space.
Tasks
- Open the file and make a new variable (dataframe) containing only the year, P99, P99.5 and P99.9 variables; these are the income levels which put someone at the 99th, 99.5th, and 99.9th, percentile of income. What was P99 in 1993? P99.5 in 1942? You must identify these using your code rather than looking up the values manually. The code for this part is given below.
Solution goes below:
wtid <- read.csv(wtid-report.csv, as.is = TRUE)wtid <- wtid[, c(Year, P99.income.threshold,P99.5.income.threshold, names(wtid) <- c(Year, P99, P99.5, P99.9) |
P99.9.income.threshold)]
- Using ggplot, display three line plots on the same graph showing the income threshold amount against time for each group, P99, P99.5 and P99.9. Make sure the axes are labeled appropriately, and in particular that the horizontal axis is labeled with years between 1913 and 2012, not just numbers from 1 to 100. Also make sure a legend is displayed that describes the multiple time series plot. Write one or two sentences describing how income inequality has changed throughout time.
Solution goes below:
## Plot
Part 3 (Titanic)
Background
In this part well be studying a data set which provides information on the survival rates of passengers on the fatal voyage of the ocean liner Titanic. The dataset provides information on each passenger including, for example, economic status, sex, age, cabin, name, and survival status. This is a training dataset taken from the Kaggle competition website; for more information on Kaggle competitions, please refer to https://www.kaggle.com. Students should download the data set on Canvas.
Tasks
4) Run the following code and describe what the two plots are producing
# Read in data
titanic <- read.table(Titanic.txt, header = TRUE, as.is = TRUE) head(titanic)
## PassengerId Survived Pclass
## 1 1 0 3 ## 2 2 1 1 ## 3 3 1 3
## 4 4 1 1
## 5 5 0 3
## 6 6 0 3
## Name Sex Age SibSp Parch
## 1 Braund, Mr. Owen Harris male 22 1 0
## 2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0
## 3 Heikkinen, Miss. Laina female 26 0 0
## 4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0
## 5 Allen, Mr. William Henry male 35 0 0 ## 6 Moran, Mr. James male NA 0 0
## Ticket Fare Cabin Embarked
## 1 A/5 21171 7.2500 S ## 2 PC 17599 71.2833 C85 C ## 3 STON/O2. 3101282 7.9250 S ## 4 113803 53.1000 C123 S ## 5 373450 8.0500 S
## 6 330877 8.4583 Q
library(ggplot2)
# Plot 1 ggplot(data=titanic) + geom_bar(aes(x=Sex,fill=factor(Survived)))+ labs(title = Title,fill=Survived)
Title
# plot 2ggplot(data=titanic) + geom_bar(aes(x=factor(Survived),fill=factor(Survived)))+ facet_grid(~Sex)+labs(title = Title,fill=Survived,x=) |
Title
- Create a similar plot with the variable Pclass. The easiest way to produce this plot is to facet by Pclass. Make sure to include appropriate labels and titles. Describe your
Solution goes below:
# Plots
- Create one more plot of your choice related to the titanic data set. Describe what information your plot is conveying.
Solution goes below:
# Plots
Part 4 (Simulating and Graphing Probability Density)
7) Simulate a n = 1000 random draws from a beta distribution with parameters = 3 and = 1. Plot a histogram of the simulated cases using ggplot. Also overlay the beta density on the histogram. Hint: look up the beta distribution using ?rbeta.
Solution goes below:
# Sim and plots
Reviews
There are no reviews yet.