[SOLVED] CS STAT340 Lecture 02: Random Variables

$25

File Name: CS_STAT340_Lecture_02:_Random_Variables.zip
File Size: 367.38 KB

5/5 - (1 vote)

title: STAT340 Lecture 02: Random Variables
author: and Wu
date: September 2021
output: html_document

Copyright By Assignmentchef assignmentchef

These notes will discuss the most fundamental object in statistics: random variables.

We use random variables, within the framework of probability theory, to model how our data came to be.

We will first introduce the idea of a random variable (and its associated distribution) and review probability theory.

Then, we will walk through a number of different basic distributions and discuss the kinds of data for which these different models are appropriate.

## Learning objectives

After this lesson, you will be able to

* Explain what a random variable is
* Identify appropriate random variables for modeling different real-world events and explain why one choice might be better or worse than another
* Combine random variables to build simple models of real-world phenomena
* Compute the probabilities of simple events under different probability distributions

## What is a random variable?

Consider the following quantities/events:

* Whether or not a coin flip comes up heads or tails.
* How many people in the treatment group of a vaccine trial are hospitalized.
* The water level measured in Lake Mendota on a given day.
* How many customers arrive at a store between 2pm and 3pm today.
* How many days between installing a lightbulb and when it burns out.

All of these are examples of events that we might reasonably model according to different *random variables*.

Later in your studies you will learn a more formal definition of what a random variable is.
For now, lets be content with saying that a random variable is a (random) number $X$ about which we can compute quantities of the form $Pr[ X in S ]$, where $S$ is a set.

## Aside: probability refresher

Before moving on, lets briefly review some basic ideas from probability theory.

We have a set of possible *outcomes*, usually denoted $Omega$.

* when we flip a coin, it can land either heads ($H$) or tails ($T$), so the outcomes are $Omega = {H, T}$.
* When we roll a die, there are six possible outcomes, $Omega = {1,2,3,4,5,6}$.
* In other settings, the outcomes might be an infinite set.
Ex: if we measure the depth of Lake Mendota, the outcome may be any positive real number (at least theoretically, anyway!)

In the vast majority of situations, $Omega$ will be either discrete (e.g., ${1,2,dots}$) or continuous (e.g., $[0,1]$) and we call the associated random variable discrete or continuous, respectively.

A subset $E subseteq Omega$ of the outcome space is called an *event*.

A *probability* is a function that maps events to numbers, with the properties that

* $Pr[ E ] in [0,1]$ for all events $E$
* $Pr[ Omega ] = 1$
* For $E_1,E_2 in Omega$ with $E_1 cap E_2 = emptyset$, $Pr[ E_1 cup E_2 ] = Pr[ E_1 ] + Pr[ E_2 ]$

Two events $E_1$ and $E_2$ are *independent* if $Pr[ E_1 cap E_2 ] = Pr[ E_1 ] Pr[ E_2 ]$.

Two random variables $X$ and $Y$ are independent if for all sets $S_1,S_2$, we have $Pr[ X in S_1 ~&~ Y in S_2 ] = Pr[ X in S_1 ] Pr[ Y in S_2 ]$.

Roughly speaking, two random variables are independent if learning information about one of them doesnt tell you anything about the other.

* For example, if each of us flips a coin, it is reasonable to model them as being independent.
* Learning whether my coin landed heads or tails doesnt tell us anything about your coin.

__Example: Coin flipping__

Consider a coin toss, in which the possible outcomes are $Omega = { H, T }$.

This is a *discrete random variable*, because the outcome set $Omega$ is discrete.

If we have a fair coin, then it is sensible that $Pr[ {H} ] = Pr[ {T} ] = 1/2$.

__Exercise (optional):__ verify that this probability satisfies the above properties!

We will see in a moment that this is a special case of a Bernoulli random variable, which you are probably already familiar with.

__Example: Six-sided die__

If we roll a die, the outcome space is $Omega = {1,2,3,4,5,6}$, and the events are all the subsets of this six-element set.

So, for example, we can talk about the event that we roll an odd number $E_{text{odd}} = {1,3,5}$ or the event that we roll a number larger than $4$, $E_{>4} = {5,6}$.

__Example: Human heights__

Consider our human height example from our previous lecture.

We pick a random person and measure their height in, say, centimeters.
__What is the outcome space?__

* One option: the outcome space is the set of positive reals, in which case this is a *continuous random variable*.
* Alternatively, we could assume that the outcome space is the set of all real numbers.

This highlights the importance of specifying our assumptions and the outcome space we are working with in a particular problem.
We will see these kinds of issues again and again this semester.

### A note on models, assumptions and approximations

Note that we are already making an approximation our outcome sets arent really exhaustive, here.

When you toss a coin, there are possible outcomes other than heads and tails.

* Perhaps the coin lands on its side (I have personally seen this happen with a nickel flipped onto an old wooden floor).
* Similarly, perhaps the die lands on its side.

We can see a kind of idealization in our human height example.

* We can only measure a height to some finite precision (say, two decimal places), so it is a bit silly to take the outcome space to be the real numbers.
* After all, if we can only measure a height to two decimal places, then there is no way to ever obtain the event, height is 160.3333333 centimeters.

These kinds of approximations and idealizations are good to be aware of, but they usually dont bother us much

We will see below and in future lectures the kinds of approximation errors that are more concerning and warrant our attention.

## Random variables

A random variable is specified by a probability.

That is, a random variable $X$ is specified by an outcome set $Omega$ and a function that specifies probabilities of the form $Pr[ X in E ]$ where $E subseteq Omega$ is an event.

Lets look at some commonly-used random variables.
In the process, we will discuss some of the real-world phenomena to which these random variables are best-suited.

### Bernoulli

A Bernoulli random variable has outcome set $Omega = {0,1}$.

As discussed above, to specify a probability on this set, it is enough for us to specify $Pr[ {0 } ]$ and $Pr[ {1} ]$.

Typically, we do this by specifying the *success probability* $p = Pr[ {1} ] in [0,1]$.
Once we have done this, it is immediate that (check!) $Pr[ {0} ] = 1-p$.

Note that we can check that this gives us a probability by verifying that it sums to 1:
Pr[ Omega ] = Pr[ {0} cup {1} ] = Pr[ {0} ] + Pr[ {1} ] = 1-p + p = 1.
Bernoulli random variables are commonly used to model yes or no events.
That is, events of the form whether or not event $A$ happens.
Common examples:

* Coin flips
* whether or not a person gets sick with a disease
* whether or not a team wins a game.

If $Z$ is a Bernoulli random variable with probability of success $p$, then we write $Z sim operatorname{Bernoulli}(p)$.

We read this as something like $Z$ is distributed as Bernoulli $p$.

### Binomial

A Bernoulli random variable is like a single coin flip.

What if we flip many coins, all with the same probability of coming up heads?

Then the total number of heads is distributed as a *binomial* random variable.

In particular, we describe a binomial distribution by specifying two *parameters*:

1. the number of trials (i.e., coins flipped)$n$, often called the *size* parameter and
2. the success probability $p$ (i.e., the probability that an individual coin lands heads).

Often we will write $operatorname{Binomial}(n,p)$ to denote this distribution.

So if $X$ is a Binomial random variable with $n$ trials and success probability $p$, we write $X sim operatorname{Binomial}(n,p)$.

__Example:__ modeling COVID-19

In a population of 250,000 people (approximately the population of Madison), we may imagine that each person has some probability $p$ of becoming seriously ill with COVID-19.

Then, in a sense, the total number of people in Madison who become seriously ill with COVID-19 is like the total number of probability-$p$ coin flips that land heads when we flip $250,000$ coins.

We might then model the number of COVID-19 patients by a binomial random variable with $n=250,000$ and $p=0.01$ (just to be clear, we are completely making up this choice of $p$ here, just for the sake of example!).

We can generate binomial random variables using the `rbinom` function. Think `r` for random.

# rbinom takes three arguments.
# The first is the number of random variables we want to generate (confusingly, this is called n in the R docs).
# The size argument specifies the number of coins to flip, i.e., n in our notation above.
# The prob argument specifies the probability that one coin lands heads, i.e., p in our notation above.
rbinom(1, size=10, prob=0.3) # produces a random number from {0,1,2,,10}, with 2,3,4 being most common.

# If we repeat the experiment a few times, we get different random values.
rbinom(1, size=10, prob=0.3);
rbinom(1, size=10, prob=0.3);
rbinom(1, size=10, prob=0.3);
rbinom(1, size=10, prob=0.3);
rbinom(1, size=10, prob=0.3);

We can also use the binomial to generate Bernoulli random variables, by setting the `size` argument to 1:

rbinom(1, size=1, prob=0.5); # 1 is heads, 0 is tails

### Geometric

Lets consider a different coin-flipping experiment.
We flip a coin repeatedly and we count how many flips it takes before it lands heads.

So perhaps we flip the coin and it comes up heads immediately, in which case we would count zero (because there were no flips before the one where the coin landed heads).
If we flipped the coin and it came up heads for the first time on the fourth toss, then we would count three, and so on.

This game describes the geometric distribution.

Its behavior is controlled by a single parameter, the probability $p$ of landing heads.

The geometric distribution is a natural model for time to failure experiments.

For example, suppose we install a light bulb, and measure how many days until the lightbulb burns out (one such experiment has been ongoing for a very long time!).

We can generate random geometric random variables using the `rgeom` function:

rgeom(1, prob=0.5); # Generate one geometric random variable with p=0.5. Most likely outcomes: 0,1,2

The probability that a $operatorname{Geom}(p)$ random variable $X$ takes a particular value $k$ ($k=0,1,2,dots$) is given by $Pr[ X = k ] = (1-p)^k p.$

This is the *probability mass function* of the geometric distribution.

Lets plot this as a function of $k$:

library(ggplot2)
k <- seq(0,15); p <- 0.3;df <- data.frame(‘k’=k,’Probk’=p*(1-p)^k );pp <- ggplot(df, aes(x=k, y=Probk) ) + geom_col();Looking at the plot, we see that the geometric distribution puts most of its probability close to zero– the most likely outcomes are 0, then 1, then 2, and so on.We plotted the distribution only up to $k=15$, but a geometric random variable can, technically, take any non-negative integer as a value.For any value of $k$, $\Pr[ X = k ] = p(1-p)^k$ is non-zero (as long as $0 < p < 1$).So for any non-negative integer, there is a small but non-zero probability that a geometric random variable takes that integer as a value.We say that the geometric random variable has *infinite support*.The support of a discrete random variable is the set of values that have non-zero probability mass.A random variable has *infinite* support if this set is infinite.__Exercise:__ verify that this is a bona fide probability by checking that $sum_{k=0}^infty p(1-p)^k = 1$.### Refresher: expectationBefore we continue with more random variables, let’s take a pause to discuss one more important probability concept: expectation.You will hopefully recall from previous courses in probability and/or statistics the notion of expectation of a random variable.__Expectation: long-run averages__ The expectation of a random variable $X$, which we write $mathbb{E} X$, is the “long-run average” of the random variable.Roughly speaking, the expectation is what we would see on average if we observed many independent copies of $X$.That is, we observe $X_1,X_2,dots,X_n$, and consider their average, $bar{X} = n^{-1} sum_{i=1}^n X_i$.The *law of large numbers* (LLN) states that in a certain sense, as $n$ gets large, $bar{X}$ gets very close to $mathbb{E} X$. (actually, there are two LLNs, the weak law and strong law, but that’s a matter for a later course!).By analogy with our calculus class, we would *like* to say something likelim_{n rightarrow infty} frac{1}{n} sum_{i=1}^n X_i = mathbb{E} X.But $n^{-1} sum_i X_i$ is a random sum, so how can we take a limit?Well, again, the details are a matter for your probabiliy theory class, but roughly speaking, for $n$ large, with high probability, $bar{X}$ is close to $mathbb{E}$.__Expectation: formal definition__More formally, if $X$ is a discrete random variable, we define its expectation to bemathbb{E} X = sum_k k Pr[ X = k],where the sum is over all $k$ such that $Pr[ X=k ] > 0$.

* Note that this set could be finite or infinite.
* If the set is infinite, the sum might not converge, in which case we say that the expectation is either infinite or doesnt exist. But that wont be an issue this semester.

__Question:__ can you see how this definition is indeed like the average behavior of $X$?

__Exercise:__ compute the expectation of a Bernoulli random variable with success probability $p$. What about a $operatorname{Binomial}(n,p)$ random variable? __Hint:__ the expectation of a sum of RVs is the sum of their expectations. Write the Binomial RV as a sum of Bernoullis.

__Important take-away:__ the law of large numbers says that if we take the average of a bunch of independent RVs, the average will be close to the expected value.

* Sometimes its hard to compute the expected value exactly (e.g., because the math is hard not all sums are nice!)
* This is where Monte Carlo methods come in instead of trying to compute the expectation exactly, we just generate lots of RVs and take their average!
* If we generate enough RVs, the LLN says we can get as close as we want.
* Well have lots to say about this in our lecture notes on Monte Carlo methods, coming soon.

### Poisson

Lets look at one more discrete distribution.

Suppose we are going fishing on lake Mendota, and we want to model how many fish we catch in an hour.

A common choice for this situation is the *Poisson* distribution (named after a French mathematician named Poisson, but poisson is also French for fish).

The Poisson distribution is a common choice for modeling arrivals or other events that happen over a span of time.
Common examples include

* customers arriving to a store
* calls to a phone line
* photons or other particles hitting a detector
* cars arriving at an intersection

The Poisson distribution has probability mass function
Pr[ X=k ] = frac{ lambda^k e^{-lambda} }{ k! }, ~ ~ ~ ~ ~ ~ k=0,1,2,dots
The parameter $lambda > 0$ controls the average behavior of the random variable larger choices of $lambda$ mean that the resulting random variable is larger, on average (we will make this statement more precise in a few lectures).

We can generate Poisson random variables using `rpois`:

rpois(1, lambda=10.5); # Generate Poisson RV with lambda=10.5; most likely value is 10.

What if I want several random Poissons, instead of just one?

The `n` argument to `rpois` (and all the other random variable generation functions) specifies a number of variables to generate.

So, for example, to get ten random Poissons, we can write

rpois(10, lambda=10.5); # Generate 10 Poisson RVs with the same parameter lambda=10.5

Once again, the Poisson distribution has infinite support, since $Pr[X=k] > 0$ for all $k=0,1,2,dots$, but lets plot its first few values.

k <- seq(0,30); lambda <- 10.5; # On average, we should get back the value 10.5,df <- data.frame(‘k’=k,’Probk’=dpois(k, lambda) );pp <- ggplot(df, aes(x=k, y=Probk) ) + geom_col();The function `dpois` above evaluates the Poisson probability mass function.The R documentation calls this a density, which is correct, but… well, we will return to this.For now, just remember “`r` for random”, “`d` for density”.### Aside: approximating one random variable with anotherInterestingly, we can obtain the Poisson distribution from the binomial distribution.Let’s make two assumptions about our fish population:* There are many fish in the lake. Let’s call the number of fish $N$, which is a large number.* For each fish, there is a *small* probability $p$ that we catch it (the same probability for each fish, for the sake of simplicity)If we let $N$ get arbitrarily large (“infinite”; a limit like you remember from calculus) while $p$ stays “small”, the Binomial distribution comes to be equal to the Poisson distribution with rate $Np$.For this reason, the Poisson is often a good approximation to the Bernoulli when $N$ is large and $p$ is small.Just to illustrate, let’s plot the density of the binomial with $N$ really large and $p$ really small, but chosen so that $Np = 10.5$ to match $lambda = 10.5$ above.k <- seq(0,30); lambda <- 10.5;N <- 1e6; p <- lambda/N; # On average, we should get back the value lambdapoisprob <- dpois(k, lambda); # Vector of Poisson probabilitiesbinomprob <- dbinom( k, size=N, prob=p ); # Binomial probs.# We need a column in our data frame encoding which of the two distributions a number comes from.# This isn’t the only way to do this, but it is the easiest way to get things to play nice with the ggplot2 facet_wrap, which displays separate plots for different values in a particular column.dist <- c( rep(‘Poisson’, length(k)), rep(‘Binom’, length(k)) );# Construct our data frame. Note that we have to repeat the k column, because our data frame is going to look like# DistributionkProbk# Poisson 0dpois( 0, lambda )# Poisson 1dpois( 1, lambda )# … …# Poisson 30 dpois( 30, lambda )# Binomial0dbinom( 0, N, p )# Binomial30dbinom( 30, N, p )df <- data.frame(‘dist’=dist, ‘k’=rep(k,2), ‘Probk’=c(poisprob,binomprob) );pp <- ggplot(df, aes(x=k, y=Probk) ) + geom_col() + facet_wrap(~dist);# facet_wrap tells ggplot to create a separate plot for each group (i.e., value) in the dist column.We will see several examples like this during the semester, in which two distributions become (approximately) equivalent if we fiddle with the parameters in the right way.## Continuous random variablesSo far we’ve seen a few different discrete random variables. Their set of possible values are discrete sets like ${0,1}$ or ${0,1,2,dots}$.This is in contrast to *continuous* random variables, which take values in “continuous” sets like the interval $[0,1]$ or the real like $mathbb{R}$.Discrete random variables have probability mass functions, like $Pr[ X=k ] = p(1-p)^k$, $k=0,1,2,dots$.In contrast, continuous random variables have *probability density functions*, which we will usually write as $f(x)$ or $f(t)$.These random variables are a little trickier to think about at first, because it doesn’t make sense to ask about the probability that a continuous random variable takes a specific value.That is, $Pr[ X = k ]$ doesn’t really make sense when $X$ is continuous(actually– in a precise sense this does make sense, but the probability CS: assignmentchef QQ: 1823890830 Email: [email protected]

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS STAT340 Lecture 02: Random Variables
$25