- SNPs of World-wide Populations: This dataset contains a data matrix X Rnp of about p = 650,000 columns of SNPs (Single Nucleid Polymorphisms) and n = 1064 rows of peoples around the world (but there are 21 rows mostly with missing values). Each element is of three choices, 0 (for AA), 1 (for AC), 2 (for CC), and some missing values marked by 9. https://drive.google.com/file/d/1KMLPEG91mnzdK2pUlq2BkjOx2BsaZy9s/view?usp=sharing which is big (151MB in zip and 2GB original txt). Moreover, the following file contains the region where each people comes from, as well as two variables ind1 andind2 such that X(ind1,ind2) removes all missing values.
https://github.com/yao-lab/yao-lab.github.io/blob/master/data/HGDP_region.mat Another cleaned dataset is due to Quanhua MU and Yoonhee Nam:
- Genotyped data of the 1043 (n) subjects. 0(AA), 1(AC), 2(CC). Missing values are removed, only autosomal SNPs were selected (p 400K). Google drive link:
https://drive.google.com/file/d/1a9I8_akfCMHBRrPMdnWkjyL9fKcQbJJq/view?usp= sharing
- Sample Information of 1043 subjects. Google drive link: https://drive.google.com/file/d/11Q-8B57WDQnrIV92b-h_WLqDGviiYsm2/view?usp= sharing
A good reference for this data can be the following paper in Science, http://www.sciencemag.org/content/319/5866/1100.abstract
Explore the genetic variation of those persons with their geographic variations, by MDS/PCA. Since p is big, explore random projections for dimensionality reduction.
- Phase Transition in Compressed Sensing: Let A Rnd be a Gaussian random matrix, e. Aij N(0,1). In the following experiments, fix d = 20. For each n = 1,,d, and each k = 1,,d, repeat the following procedure 50 times:
1
Homework 4. Random Projections 2
- Construct a sparse vector x0 Rd with k nonzero entries. The locations of the nonzero entries are selected at random and each nonzero equals 1 with equal probability;
- Draw a standard Gaussian random matrix A Rnd, and set b = Ax0;
- Solve the following linear programming problem to obtain an optimal point x,
minx kxk1 := X|xi|
s.t. Ax = b,
for example, matlab toolbox cvx can be an easy solver;
(d) Declare success if kx x0k 103;
After repeating 50 times, compute the success probability p(n,k); draw a figure with x-axis for k and y-axis for n, to visualize the success probability. For example, matlab command imagesc(p) can be a choice.
Can you try to give an analysis of the phenomenon observed? The following paper by Tropp et al. may give you a good starting point to think.
- Dennis Amelunxen, Martin Lotz, Michael B. McCoy, Joel A. Tropp. Living on the edge: Phase transitions in convex programs with random data. arXiv:1303.6672. URL:
Reviews
There are no reviews yet.