[Solved] (EE270)Large Scale Matrix Computation, Optimization and Learning-HW #1

$25

File Name: _EE270_Large_Scale_Matrix_Computation__Optimization_and_Learning_HW__1.zip
File Size: 659.4 KB

SKU: [Solved] (EE270)Large Scale Matrix Computation, Optimization and Learning-HW #1 Category: Tag:
5/5 - (1 vote)

1. Linear Algebra

Are the following statements true or false? If true, prove it; if false, show a counterexample.

(a) The inverse of a symmetric matrix is itself symmetric. (b) All 2 2 orthogonal matrices have the following form

.

(c) Let can be written as A = CCT for some matrix C.

2. Divide and Conquer Matrix Multiplication

If A is a matrix, then A2 = AA is the square of A.

  • Show that five multiplications are sufficient to compute the square of a 2 2 matrix

, where a1,a2,a3,a4 are scalars.

  • Generalize the formula in part (a) to a 2 2 block matrix where

A1,A2,A3,A4 are arbitrary matrices.

  • Instead of using the classical matrix multiplication (three-loop) algorithm for computing A2, we may apply the block formula you derived in (b) to reduce 2n 2n problems to several n n computations, which can be tackled with classical matrix multiplication. Compare the total number of arithmetic operations. Generate 2n2n random A matrices and plot the wall-clock time of the classical matrix multiplication algorithm and the algorithm using the formula in (b) to compute A2 for n = 4,,10000 (or as large as your system memory allows). You can use standard packages for matrix multiplication, e.g., numpy.matmul.
  • Show that if you have an algorithm for squaring an nn matrix in O(nc) time, then you can use it to multiply any two arbitrary n n matrices in O(nc) time. [Hint: Consider multiplying two matrices A and B. Can you define a matrix whose square contains AB?] 3. Probability (30 pts)
  • Random variables X and Y have a joint distribution p(x,y). Prove the following results. You can assume continuous distributions for simplicity.
    1. E[X] = EY [EX[X|Y ]]
    2. E[I[X C]] = P(X C), where I[X C] is the indicator function[1] of an arbitrary set C.
  • var[X] = EY [varX[X|Y ]] + varY [EX[X|Y ]] iv. If X and Y are independent, then E[XY ] = E[X]E[Y ].
  1. If X and Y take values in {0,1} and E[XY ] = E[X]E[Y ], then X and Y are independent.
  • Show that the approximate randomized counting algorithm described in Lemma 1 of Lecture 2 slides (page 14) is unbiased:

En = n. (1)

  • Prove the variance formula in Lemma 2 of Lecture 2 slides (page 38) for Approximate Matrix Multiplication AB CR

. (2)

where {pk}dk=1 are sampling probabilities.

4. Positive (Semi-)Definite Matrices

Let A be a real, symmetric d d matrix. We say A is positive semi-definite (PSD) if, for all x Rd, x>Ax 0. We say A is positive definite (PD) if, for all x = 06, x>Ax > 0. We write 0 when A is PSD, and 0 when A is PD.

The spectral theorem says that every real symmetric matrix A can be expressed A = UU>, where U is a d d matrix such that UU> = U>U = I (called an orthogonal matrix), and = diag(1,,d). Multiplying on the right by U we see that AU = U. If we let ui denote the ith column of U, we have Aui = iui for each i. This expression reveals that the i are eigenvalues of A, and the corresponding columns ui are eigenvectors associated to i.

  • A is PSD iff i 0 for each i.
  • A is PD iff i > 0 for each i. Hint: Use the following representation

d

UUT = XiuiuiT .

i=1

5. Norms

  • For p = 1,2,, verify that the functions k kp are norms. Then, for a vector x Rn, show that

kxk kxk2 kxk1 nkxk2 nkxk

and for each inequlaity, provide an example demonstrating that the inequality can be tight.

  • For vectors x,y Rn, show that |xTy| kx||2kyk2 with equality if and only if x and y are linearly dependent. More generally, show that xTy kxk1kyk. Note that this implies that; and that these are special cases of Holders inequality.
  • For A Rmn, show that Trace(ATA) = Pij A2ij, and show that qPij Aij2 is a norm on mn This is the Frobenius norm, denoted kkF. Show that, in addition to satisfying the definining properties of a norm, the Frobenius norm is a submultiplicative norm, in that

kABkF kAkFkBkF

whenever the dimensions are such that the product AB is defined.

  • Recall the definiton of the spectral norm of an mn matrix A : kAk2 = pmax(ATA) = max(A), where max(ATA) is the largest eigenvalue pf ATA and max is the largest singular value of A. Show that the Frobenius norm and the spectral norm are unitarily invariant: if U and V are unitary (orthogonal in the real case) matrices, then kUTAV k = kAk, for = 2,F.

6. Approximate Matrix Multiplication

Here, we will consider the empirical performance of random sampling and random projection algorithms for approximating the product of two matrices. You may use Matlab, or C, or R, or any other software package you prefer to do your implementations. Please be sure to describe what you used in sufficient detail that someone else could reproduce your results. Let A be an n d matrix, with, and consider approximating the product ATA. First, generate the matrices A from one of three different classes of distributions introduced below.

  • Generate a matrix A from multivariate normal N(1d,), where the (i, j)th element of ij = 2 0.5|ij|.(Refer to as GA data.)
  • Generate a matrix A from multivariate t-distribution with 3 degree of freedom and covariance matrix as before. (Refer to as T3 data.)
  • Generate a matrix A from multivariate t-distribution with 1 degree of freedom and covariance matrix as before. (Refer to as T1 data.)

To start, consider matrices of size nd equal to 50050. (So, you should have three matrices, one matrix A generated in each of the above ways.)

  • For each matrix, approximate the product (ATA) with the random sampling algorithm we discussed in class, i.e., by sampling with respect to a probability distribution that depends on the norm squared of the rows of the input matrix. Plot the probability distribution. Does it look uniform or nonuniform? Plot the performance of the spectral and Frobenius norm error as a function of the number of samples.
  • For each matrix, approximate the product (ATA) with the random sampling algorithm we discussed in class, except that the uniform distribution, rather than the norm-squared distribution, should be used to construct the random sample. Plot the performance of the spectral and Frobenius norm error as a function of the number of samples. For which matrices are the results similar and for which are they different than when the norm-squared distribution is used ?
  • Now you will implement the matrix approximation technique on the MNIST dataset for handwritten digit classification. Details about MNIST dataset can be found at http://yann.lecun.com/exdb/mnist/. We provide the dataset in .mat file so that you can easily import it into Matlab by using load(mnist matrix.mat). To import the dataset in Python you can use:

import scipy.io data = scipy.io.loadmat(mnist matrix.mat)

In .mat file you will find one matrix, A R60000784. For this matrix, approximate the product (ATA) with the random sampling algorithm we discussed in class, i.e., by sampling with respect to a probability distribution that depends on the norm squared of the rows of the input matrix. Plot the probability distribution. Dos it look uniform or nonuniform? Plot the performance of the spectral and Frobenius norm error as a function of the number of samples.

[1] I[X C] := 1 if X C and 0 otherwise

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] (EE270)Large Scale Matrix Computation, Optimization and Learning-HW #1[Solved] (EE270)Large Scale Matrix Computation, Optimization and Learning-HW #1
$25