- Phase transition in PCA spike model: Consider a finite sample of n i.d vectors x1,x2,,xn drawn from the p-dimensional Gaussian distribution N(0,2Ipp + 0uuT ), where 0/2 is the signal-to-noise ratio (SNR) and u Rp. In class we showed that the largest eigenvalue of the sample covariance matrix Sn
pops outside the support of the Marcenko-Pastur distribution if
or equivalently, if
SNR.
2, that is, 0 can be buried well inside the support Marcenko(Notice that < (1 + )
Pastur distribution and still the largest eigenvalue pops outside its support). All the following questions refer to the limit n and to almost surely values:
- Find given SNR > .
- Use your previous answer to explain how the SNR can be estimated from the eigenvaluesof the sample covariance matrix.
- Find the squared correlation between the eigenvector v of the sample covariance matrix (corresponding to the largest eigenvalue ) and the true signal component u, as a function of the SNR, p and n. That is, find |hu,vi|2.
- Confirm your result using MATLAB, Python, or R simulations (e.g. set u = e; and choose = 1 and 0 in different levels. Compute the largest eigenvalue and its associated eigenvector, with a comparison to the true ones.)
1
Homework 2. Random Matrix Theory and PCA 2
- Exploring S&P500 Stock Prices: Take the Standard & Poors 500 data:
https://github.com/yao-lab/yao-lab.github.io/blob/master/data/snp452-data.mat which contains the data matrix X Rpn of n = 1258 consecutive observation days and p = 452 daily closing stock prices, and the cell variable stock collects the names, codes, and the affiliated industrial sectors of the 452 stocks. Use Matlab, Python, or R for the following exploration.
- Take the logarithmic prices Y = logX;
- For each observation time t {1,,1257}, calculate logarithmic price jumps
Yi,t = Yi,t Yi,t1, i {1,,452};
- Construct the realized covariance matrix R452452 by,
1257
;
=1
- Compute the eigenvalues (and eigenvectors) of and store them in a descending order by {k,k = 1,,p}.
- Horns Parallel Analysis: the following procedure describes a so-called Parallel Analysis of PCA using random permutations on data. Given the matrix [Yi,t], apply random permutations i : {1,,t} {1,,t} on each of its rows: Yi,i(j) such that
Y1,1 Y 2,2(1)[Y(i),t] = Y3,3(1) Yn,n(1) | Y1,2Y2,2(2)Y3,3(2) Yn,n(2) | Y1,3Y2,2(3)Y3,3(3) Yn,n(3) | Y1,t Y2,2(t) Y3,3(t) . Yn,n(t) |
Define as the null covariance matrix. Repeat this for R times and compute the eigenvalues of r for each 1 r R. Evaluate the p-value for each estimated eigenvalue k by (Nk+1)/(R+1) where Nk is the counts that k is less than the k-th largest eigenvalue of r over 1 r R. Eigenvalues with small p-values indicate that they are less likely arising from the spectrum of a randomly permuted matrix and thus considered to be signal. Draw your own conclusion with your observations and analysis on this data. A reference is: Buja and Eyuboglu, Remarks on Parallel Analysis, Multivariate Behavioral Research, 27(4): 509-540, 1992.
- *Finite rank perturbations of random symmetric matrices: Wigners semi-circle law (proved by Eugene Wigner in 1951) concerns the limiting distribution of the eigenvalues of random symmetric matrices. It states, for example, that the limiting eigenvalue distribution of n n symmetric matrices whose entries wij on and above the diagonal (i j) are i.i.d Gaussians
) (and the entries below the diagonal are determined by symmetrization, i.e., wji = wij) is the semi-circle:
,
where the distribution is supported in the interval [1,1].
Homework 2. Random Matrix Theory and PCA 3
- Confirm Wigners semi-circle law using MATLAB, Python, or R simulations (take, e.g.,n = 400).
- Find the largest eigenvalue of a rank-1 perturbation of a Wigner matrix. That is, findthe largest eigenvalue of the matrix
W + 0uuT ,
where W is an n n random symmetric matrix as above, and u is some deterministic unit-norm vector. Determine the value of 0 for which a phase transition occurs. What is the correlation between the top eigenvector of W + 0uuT and the vector u as a function of 0? Use techniques similar to the ones we used in class for analyzing finite rank perturbations of sample covariance matrices.
[Some Hints about homework] For Wigner Matrix), the answer is
eigenvalue is
eigenvector satisfies (
Reviews
There are no reviews yet.