ML estimation with PDF
Theoretical analysis
Question 1: maximum likelihood estimator?
For n iid observations xi of the height of the river, the likelihood can be written as following
L(a;x1,…,x_{n}) = ~n fH(xi)
i=1
~ _{in}
e~ 1 i=1 x2
2a i
xi
The loglikelihood can be derived from the likelihood as follows
l(a; x1, …, x_{n}) = log(L(a; x1, …, x_{n}))
1 ~
ln(xi) nln(a) 2a
i=1
n
Dl n 1 ~
Da(a; x1, …, x_{n}) =
a 2a^{2}
i=1
Then one can derive the maximum likelihood estimator by setting the partial derivative to 0.
Dl
_{Da}(~an; x1, …, x_{n}) = 0 ~~
Question 2: method of moments estimator?
~E[X] = xfH(x)dx
~ 0 +0′) x2 a_ e_x2
= 2a dx
~ 0 +~ e_x2
= 2a dx
0
4/7ra
= 2
One can estimate the expectation using the arithmetic mean. Hence, the method of moments estimator ~a_{n} is:


Question 3: properties of ~a_{n}? a) Unbiased?
~n1E[~a_{n}] = 2ni=1  E[X^{2}i ] 
=  12  E[X^{2}] 
1 f00 3
= I_{ e}a dx
2j0 a
~ 
^{+}^{00}= xe^{ 2a} dx 0
E[~a_{n}] = a
~a_{n} is unbiased.
 b) Optimal? Let us derive the variance of the estimator ~a_{n}.
V ar[~a_{n}] =
~4n^{2}
i=1
Var[X^{2}]
1 ~E[X^{4}] ~ E[X^{2}]^{2}~ = 4n
a
(
–
We know from question a) that E[X2] =
1 Thus,V ar[
_{n}] = ____
4aE[X^{2}]
E[X^{2}]^{2}
4n
2E[~a_{n}] = 2a.
a^{2}V ar[~a_{n}] =n 
Let us now compute the Fisher information I(a)
a2 n 1 n ~
~a2_____ l(a; x1~ ~~~~ xn)= a^{2} – a^{3}
i=1
= E[~_{Da}_{2}l(a;x1,~~~~x_{n})]( 1

= a2 aI_{n}(a) = n
a2
a2
In(a)
n
1E[X2])
~ +00 x5
E[X4] = a e~x2
2a dx
0= 4 f+00x3edx
0= 4aE[X2]
)
2
1V ar[~a_{n}] = In(a) 
Its variance equals the Cramer–Rao lower bound and it is unbiased. Hence, ~a_{n} minimizes the mean squared error. So ~a_{n} is both optimal.
 Efficient? Since ~a_{n} is unbiased and optimal. Therefore, ~a_{n} is efficient because its variance is equal to the CramerRao lower bound.
 Asymptotically Gaussian? The maximum likelihood estimator is asymptotically gaussian. Hence, ~a_{n} is asymptotically gaussian. I(a) = In(a)
n = a2 1
/n(~a_{n} a) d~~~~~nboo  N(0, a2) 
Application on real data
Question 1: p function of a?
Let p the probability that a disaster happens during one year.
p = 1 – FH(6)
~ ~ x a e~ ^{x}^{2}= 2a dx
~ 6 – e x2 ~~
2a
=
6
18
p = e~ a
Question 2: probability of at most one disaster?
During one thousand years, if at most one disaster happened, it means either there was no disasters, or therewas only one. Let us derive p1, the probability that at most one disaster happens during one thousand years.
p1 = (1 – p)^{999} = (1 ~ e~ 18 a )999
Question 3: estimation of the probability?
X = c(2.5, 1.8, 2.9, 0.9, 2.1, 1.7, 2.2, 2.8) n = length(X)a = sum(X~2)/(2*n)p = (1exp(18/a))~999 
Regarding the set of 8 observations, one can estimate a~ = 2.42. The probability p1 can be estimated:
p1 = 0.557 .
Exercise 1: Rayleigh distribution
(a)
The parameter a of the Rayleigh distribution was estimated with the maximum likelihood estimator ~a_{n}. It was found that a 2.42.
(b)
implemented in R. One has to be careful that the scale a used in R corresponds to i/a.
One can generate more samples following a Rayleigh distribution by using the Rayleigh distribution function
n = 100000
X = rrayleigh(n, scale=sqrt(a))
hist(X, nclass=50, xlim = c(0,10), main = “Rayleigh implemented in R”)
Rayleigh implemented in R
0 2 4 6 8 10
X
If one has only access to uniform distribution and would like to output a Rayleigh distribution, one can use
the inverse distribution function.
~ x
ae~ ^{t}^{2}
t
F (x) = 2a dt
0
= 1 – e~  _{x}22a 
/
F (x) = u ~~ x = 2a ln(1 – u)
If U follows a uniform distribution on [0, 1], one can generate samples following the Rayleigh distribution
using the uniform distribution.
/
U U[0, 1] =~ 2a ln(U) Rayleigh(a)
n = 100000
U = runif(n)
X = sqrt(2*a*log(U))
hist(X, nclass=50, xlim = c(0,10), main = “Simulated Rayleigh”)
Simulated Rayleigh
0 2 4 6 8 10
X
(c)
Empirically, one can verify that the MLE is unbiased by averaging N samples of the MLE ~an,1, …, ~an,N with whatever value for n. For computing resources reasons, let’s take n = 10 and average over N = 100000 samples of n observations.
_{>J}^{N}1 E[~a_{n} a]Nk=1  (~an,k – a) 
=  1N  _{>J}^{N}k=1  _{>J}^{n}( 12ni=1  x2 i~k – a) 
N = 100000n = 10E = 0for (k in 1:N){X = rrayleigh(n, scale=sqrt(a))E = E + 1/(2*n)*sum(X~2) – a}E = E/NE 
## [1] 0.002336315
E[~a,a]a  ~ 9.7 x 10^{}^{4} << 1. Hence, empirically, the estimator is unbiased. 
Empirically, one can verify the efficiency of the MLE estimator by computing its variance and compare it to the inverse of the Fisher information. One needs an unbiased estimator of the variance, knowing that the mean is a.
_{X}N1 V ar[~a_{n}]Nk=1  (~an~k – a)^{2} 
=  1N  _{X}Nk=1  _{X}n( 12ni=1  x2 i~k – a  _{)}2 
N = 100000 n = 10I = n/a~2 # Fisher informationV = 0for (k in 1:N){X = rrayleigh(n, scale=sqrt(a))dV = 1/(2*n)*sum(X~2) – aV = V + dV~2}V = V/NV 
## [1] 0.5887242
1  _{a}2  ~ 0.5847 
I_{n}(a)  n 
V ar[~a_{n}] – ^{ 1 }I,(a)1 I,(a)  ~ 0.00683 << 1 
Hence, one can say that V ar[~a_{n}] = 1
I,(a) and the estimator is efficient empirically.
v’n(~a_{n} a) d~~~~~nboo  N(0, a2) 
The asymptotic normality means that for n large, /n(~a_{n} – a) N(0, a^{2}). Thus one can plot several samples of the random variable Z_{n} = ~n(~a_{n} – a) and check whether the distribution looks Gaussian.

n = 10000 # Size of the observations for each a_n N = 10000 # Number of samples of Z_n
Z_n = rep(0,N)
for (k in 1:N){
X = rrayleigh(n, scale=sqrt(a))
a_n = 1/(2*n)*sum(X~2)
Z_n[k] = sqrt(n)*(a_n – a)
}
hist(Z_n, breaks=40, main = “Asymptotic normality”)
Asymptotic normality
−5 0 5 10
Z_n
The MLE estimator is asymptotically normal. One can verify the standard deviation a = 2.418.
ML estimation with PMF
Statistical modelling and theoretical analysis
Question 1: Belonging to exponential family
We study the random variable X that follows a geometric distribution with parameter q E ]0; 1[. We have :Vk E N^{*}, P(X = k) = q(1 _ q)^{k}^{_}^{1}
The likelihood function can be written as :
L(x,q) = 1N*(x) x q(1 _ q)^{x}^{_}^{1}
= 1N*(x) x q x (x_1) ln(1_q)
= 1N*(x) x q
1 q x xln(1_q)
We can notice that the model is dominated and the distribution domain where L(x, q) > 0 is N^{*} which does not depend on q. Thus the distribution domain is homogeneous. We then define:
We can then write the likelihood like :
L(x, q) = h(x)φ(q) exp (Q(q)S(x))
We can conclude that a geometric distribution belongs to the exponential family and S is a sufficient statistic.Since S is linearly independent with itself, we also deduce that the model is identifiable. Question 2: Computation of the Fisher Information Matrix
We saw in question 1 that the model was dominated and the distribution domain was homogeneous. We can also easily show that L(x, q) is twice differentiable for variable q and integrable, since it is a polynomial function. Thus the model is regular. We note l(x, q) the loglikelihood of the model:
Vx E N* l(xq) = ln(q(1 – q)x_1)
= ln(q) + (x 1) ln(1 q)
We can now deduce the score function:
Vx E N^{*},s_{q}(x) = ^{a}l(x,q)
aq
1
=
q We can note that the score function is an affine transform of X, thus it is squareintegrable because X is, so the Fisher Information Matrix is welldefined. We showed previously that the model was regular, thus we have:
I(q) = E_{q} (s_{q}(X)^{2})
= Eq ( aq2 2___ l(X,q))
Eq (–1 X 1 _{q}2 (1 _q)2)E_{q}(X)_1 1
and E(X) =
(1q)^{2} q






(1q)^{2} qq^{2}
q^{2}(1 – q)2 + q^{2}(1 – q)^{2}
1q^{2}(1 – q)
Question 3: Maximum likelihood estimator
Let X1, …, X_{n} a nsample following the same distribution as X. The likelihood of the model is :
L(x1,…,x_{n},q) = ~n q(1 – _{q})xi_1
i=1
( q (1
– q)
q)
n
= nln ( 1____ q) + ln(1 q) xi

i=1
We look for ~q_{n} that maximizes the likelihood of the nsample. Thus it satisfies two equations :
aq
a^{2}
___ l(x1,…,x_{n},~q_{n}) < 0 (2)

~ 1 __________ ~ ~_{n}
+ 1 i=1__ xi
n ~ = 0
~q_{n} 1 ~ ~q_{n} ~q_{n}
~_{n}
n i=1 xi
~q_{n}(1 ~q_{n}) 1
~~  n = ~q_{n}  ni=1  xi 
1 _{=}1
~~ ~q_{n} n
The maximum likelihood estimator q^{—}_{n} is
Question 4: Asymptotic behavior of the estimator
Now we are going to study the asymptotic behavior of the estimator. According to the Central limit theorem, we have:
n (Xn (0,V (X)) where V (X) = 1 q
q ) q^{2}
~ ~ ~ ~
L
~~ ~n X_{n} ~ 1 0, 1 ~ q
~~ N __
q q2
Then, we use the delta method. We define: g : x 1 which is differentiable in^{1}. We have:
x q
n(g (Xn)g(1))L 0‘g,()2 1 q)
N
q qq2
~~~n(~q_{n}_q)^{L}~*N(0,(1_q)q^{2}) ~~ vn(~q_{n}q)^{ L}~N(0,I~^{1}(q))
This estimator is asymptotically normal and its asymptotic variance is the Cramer Rao bound, thus the estimator is asymptotically efficient. This was expected because this is a maximum likelihood estimator.
Question 5: Asymoptotic confidence interval
Finally, we build an asymptotic confidence interval for q. On pose:
X_{n} = 1 n Xi
n i=1
Then, we know that : ^{ }^{nq} t(n – 1), where t is the student law. Since the law is symmetric, we can
Sn )
~n(X_{n}~ write: ~t^{n}~^{1}
~~2 ~ Sn ~ tn~1
q ) ~~2 where t^{k} ~is the unique real number that verifies P(t(k) < t^{k}~) = 1 – a.
Finally, we have:
– tn~1
~/2 ~
S_{n} 1
~~ ~ ~_{n}t^{n}~^{1} ~ Sn
~~2 ~ X_{n} ~ ~_{n}t^{n}~^{1}
~~2
q
1~~ _______________________________ ~ q ~X_{n} + ^{S}n~_{n}t^{n}~^{1}~~2  1  
X Sntn–1n – Vn a/2 
Application on real data
Question 1: Estimation of the fraud probability
X = c(44, 9, 11, 59, 81, 19, 89, 10, 24, 07, 21, 90, 38, 01, 15, 22, 29, 19, 37, 26, 219,  2, 57, 11, 34 
n = length(X) p = (n)/sum(X) 
The probability of fraud_{ p}_{fraud} can be estimated with the estimator studied above: pfraud = 0.026 . We take 1 – ~= 0.95, we can deduce a confidence interval:
t_alpha = 2.021 #quantile for student law for n = 40, closest value available.
Xn=mean(X)
Sn=sqrt(mean((XXn)**2)*n/(n1))
a = 1/(Xn+Sn*t_alpha/sqrt(n))
b = 1/(XnSn*t_alpha/sqrt(n))
With a 95% confidence, we know that_{ p}_{fraud} is between 0.019 and 0.043 . If we have n0 = 20000 validatedtickets, we can estimate the number nfraud of fraudsters. For 1000 users, there are 26 fraudsters and 974

Exercise 2: Geometric distribution
The parameter_{ p}_{fraud} of the geometric distribution was estimated with the maximum likelihood estimator ~p_{n}. It was found that_{ p}_{fraud} ~0.026 .
One has only access to uniform distribution and would like to output a geometric distribution. We start by randomly drawing a number q between 0 and 1, according to the uniform distribution. Then, we realize the following segmentation of the interval [0, 1]:
+ook^{}1 k
1 I I ( i1 ( i1
[
0, 1] = U pf raud^{1} – pfraud) , pf raud^{1} – pfraud)
k=1 i=1 i=1
= +~~ [1 (1 _ pfraud)k^{ 1}~1 (1 _ pfraud)^{k}] k=1
Thus, if we draw q, we look for k that satisfies:
( k1 (
1 1 pfraud) q 1 1 pfraud)
~~ (1 pfraud)^{k}^{–}^{1} < q 1< (1 pfraud)^{k} ~~ (1 pfraud)^{k} < 1 q <(1 pfraud)^{k}^{–}^{1}
~~ kln(1 _ pfraud) <ln(1q) < (k1)ln(1pf_{rau}d) ~~ k1 < ln(1q) <k
ln(1 – pfraud)
Since the probability to draw an integer is 0, we can choose k = [ln(1–q)
ln(1^{–}pfraud)^{1}
If U follows a uniform distribution on [0, 1], one can generate samples following the geometric distribution using the uniform distribution.
U ~ U[0, 1] =~ ~ ln(1 ~ U) ln(1pfraud)1 ~G(pfraud)
n = 100000
U = runif(n)
X = ceiling(log(1U)/log(1p))
hist(X, nclass=800, xlim = c(0,200), main = “Simulated Geometric”)
Simulated Geometric
0 50 100 150 200
X
(c)
We have shown that:
n(~q_{n}~q)^{ d}~~N(0,q^{2}(1_q))
The asymptotic normality means that for n large, Jn(~a_{n} – a) N(0, a^{2}). Thus one can plot several samples of the random variable Z_{n} = ~n(~q_{n} – q) and check whether the distribution looks Gaussian.
n = 10000 # Size of the observations for each q_n N = 10000 # Number of samples of Z_nZ_n = rep(0,N) for (k in 1:N) {U = runif(n)X = ceiling(log(1U)/log(1p))p_n = 1/mean(X)Z_n[k] = sqrt(n)*(p_n – p)}hist(Z_n, breaks=100, main = “Asymptotic normality”) 
Asymptotic normality
−0.10 −0.05 0.00 0.05 0.10
Z_n
var_emp = mean(Z_n**2) var_theo = p~2*(1p) 
The MLE estimator is asymptotically normal.An estimation of the variance gives a = 6.72 x 10~^{4}, which is close to the value : p2fraude * (1 pfraude) = 6.67 X 10−4
(d)We have shown that a 95% confidence interval is :1________________________________________________________________________________________________________________________________________________ – q –X_{n} + ^{S,}~_{n}t^{n}^{–}^{1}~~2  1 
X_{n} – ^{S,}~_{n}t^{n}^{–}^{1}~~2 
1________________________ 1
For a 95% confidence interval. On obtain: ~n . We use the 39 values given and
Xn+2.021_ Sn ~n < q < Xn2.021 Sn
we simulated 5000 times a 39 dataset, we estimate q and compute the % of q in the confidence interval.
n = 39 # Size of the observations for each q_n N = 50000 # Number of samples of Z_ncount = 0q_n = rep(0,N)for (k in 1:N){U = runif(n)X = ceiling(log(1U)/log(1p))p_n = 1/mean(X)q_n[k] = p_nif (a<p_n & p_n<b){count = count + 1}}hist(q_n, breaks=100, main = “Confidence interval”) 
Confidence interval




We obtain that 0.986 of the simulations give an estimation of q in the confidence interv
Reviews
There are no reviews yet.