[SOLVED] data mining python algorithm PowerPoint Presentation

$25

File Name: data_mining_python_algorithm_PowerPoint_Presentation.zip
File Size: 489.84 KB

5/5 - (1 vote)

PowerPoint Presentation

CS373 Data Mining and
Machine Learning

Lecture 12
Jean Honorio

Purdue University

Except for the first two and last two slides, the presentation was taken from
http://www.cs.cmu.edu/~bapoczos/Classes/ML10715_2015Fall/

Symmetric matrix
Orthonormal matrix

Diagonal matrixof
eigenvalues

Eigenvectors, columns of
Python:

Linear Algebra: Eigen Decomposition

import numpy as np
import numpy.linalg as la

Sigma = np.array([[ 5.2,3.3,-2],
[ 3.3,8.1,4],
[ -2,4,6.5]])
lam, U = la.eig(Sigma)

>>> lam
array([0.38, 7.7 ,11.71])
>>> U
array([[-0.61,0.75,0.25],
[ 0.55,0.18,0.81],
[-0.56, -0.64,0.53]])

>>> np.dot(U.T,U)
array([[1.,0.,0.],
[0.,1.,0.],
[0.,0.,1.]])

>>> np.dot(U,np.dot(np.diag(lam),U.T))
array([[ 5.2,3.3, -2. ],
[ 3.3,8.1,4. ],
[-2. ,4. ,6.5]])

=UUT

UTU = I

U

Matrix
Orthonormal matrices
,

Diagonal matrix of
singular values

Python:

Linear Algebra: Singular Value Decomposition

import numpy as np
import numpy.linalg as la

X = np.array([[ 0.6,-0.7],
[ 2.5,1.9],
[ -1.6,-0.9],
[ -2.8,0.8]])
U, s, Vt = la.svd(X,False)

>>> U
array([[-0.09,0.39],
[-0.69, -0.54],
[ 0.42,0.2 ],
[ 0.58, -0.72]])
>>> s
array([ 4.24,2.13])
>>> Vt
array([[-0.96, -0.27],
[ 0.27, -0.96]])

>>> np.dot(Vt,Vt.T)
array([[ 1.,0.],
[ 0.,1.]])

# U*Diag(s)*Vt
>>> np.dot(U,np.dot(np.diag(s),Vt))
array([[ 0.6, -0.7],
[ 2.5,1.9],
[-1.6, -0.9],
[-2.8,0.8]])

X =USV T

S
UTU = I V TV = I

3

Motivation

4

Data Visualization
Data Compression
Noise Reduction

PCA Applications

5

Data Visualization
Example:

Given 53 blood and urine samples

(features) from 65 people.

How can we visualize the measurements?

6

Matrix format (6553)
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHCH-MCHC

A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000
A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000
A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000
A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

In
st

an
ce

s

Features

Difficult to see the correlations between the features

Data Visualization

7

Spectral format (65 curves, one for each person)

0 10 20 30 40 50 600
100
200
300
400
500
600
700
800
900

1000

measurement

Va
lu

e

Measurement
Difficult to compare the different patients

Data Visualization

9

0 50 150 250 350 450
50
100
150
200
250
300
350
400
450
500
550

C-Triglycerides

C
-L

D
H

0 100
200300

400500

0
200

400
600
0

1

2

3

4

C-Triglycerides
C-LDH

M
-E

PI

Bi-variate Tri-variate

How can we visualize the other variables???

difficult to see in 4 or higher dimensional spaces

Data Visualization

10

Is there a representation better than the coordinate axes?

Is it really necessary to show all the 53 dimensions?
what if there are strong correlations between the

features?

How could we find
the smallest subspace of the 53-D space that
keeps the most information about the original data?

A solution: Principal Component Analysis

Data Visualization

11

PCA Algorithms

12

Orthogonal projection of the data onto a lower-dimension linear
space that
maximizes variance of projected data (purple line)

minimizes the mean squared distance between

data point and
projections (sum of blue lines)

PCA:

Principal Component Analysis

13

Idea:
Given data points in a d-dimensional space,

project them into a lower dimensional space while
preserving as much information as possible.
Find best planar approximation of 3D data
Find best 12-D approximation of 104-D data

In particular, choose projection that

minimizes squared error
in reconstructing the original data.

Principal Component Analysis

14

PCA Vectors originate from the center of mass.

Principal component #1: points in the direction of
the largest variance.

Each subsequent principal component
is orthogonal to the previous ones, and
points in the directions of the largest

variance of the residual subspace

Principal Component Analysis

Properties:

15

2D Gaussian dataset

16

1st PCA axis

17

2nd PCA axis

18

m

i
i

T
i

T

m 1
2

11
1

2 })]({[
1

maxarg xwwxww
w

}){(
1

maxarg
1

2
i

1
1

m

i

T

m
xww

w

To find w2, we maximize
the variance of the
projection in the residual
subspace

To find w1, maximize the variance of projection of x

x PCA reconstruction

Given the centered data {x1, , xm}, compute the principal vectors:

1st PCA vector

2nd PCA vector

x

w1

w
x=w1(w1Tx)

w

PCA algorithm I (sequential)

x-x w2

19

m

i

k

j
i

T
jji

T
k m 1

2
1

1
1

})]({[
1

maxarg xwwxww
w

We maximize the variance
of the projection in the
residual subspace

Maximize the variance of projection of x

x PCA reconstruction

Given w1,, wk-1, we calculate wk principal vector as before:

kth PCA vector

w1(w1Tx)

w2(w2Tx)

x

w1

w2
x=w1(w1Tx)+w2(w2Tx)

w

PCA algorithm I (sequential)

20

Given data {x1, , xm}, compute covariance matrix 6

PCA basis vectors = the eigenvectors of 6

Larger eigenvalue more important eigenvectors

6
m

i

T
im 1

))((
1

xxxx

m

i
im 1

1
xxwhere

PCA algorithm II
(sample covariance matrix)

i

21

PCA algorithm(X, k): top k eigenvalues/eigenvectors
% X = N u m data matrix,

% each data point xi = column vector, i=1..m

X subtract mean x from each column vector xi in X

6 X XT covariance matrix of X
{ Oi, ui }i=1..N = eigenvectors/eigenvalues of 6

O1 t O2 t t ON

Return { Oi, ui }i=1..k
% top k PCA components

m

im 1
1

ixx

PCA algorithm II
(sample covariance matrix)

24

Singular Value Decomposition of the centered data matrix X.

Xfeatures u samples = USVT

X VT S U =

samples

significant

noise

no
is

e noise

si
gn

if
ic

an
t

sig.

PCA algorithm III
(SVD of the data matrix)

25

Columns of U
the principal vectors, { u(1), , u(k) }
orthogonal and has unit norm so UTU = I
Can reconstruct the data using linear combinations

of { u(1), , u(k) }

Matrix S
Diagonal
Shows importance of each eigenvector

Columns of VT

The coefficients for reconstructing the samples

PCA algorithm III

26

Applications

Want to identify specific person, based on facial image
Robust to glasses, lighting,
Cant just use the given 256 x 256 pixels

Face Recognition

27

Example data set:Images of faces
Eigenface approach

[Turk & Pentland], [Sirovich & Kirby]

Each face x is

256 u 256 values (luminance at location)
x in 256u256(view as 64K dim vector)

Form X = [ x1 , , xm ] centered data
mtx

Compute6= XXT

Problem: 6 is 64K u 64K HUGE!!!

Applying PCA: Eigenfaces

29

256 x 256
real values

m faces

X =

x1, , xm

34

Happiness subspace

35

Disgust subspace

37

Facial Expression Recognition
Movies

Image Compression

40

Divide the original 372492 image into patches:

Each patch is an instance that contains 1212 pixels on a grid

Consider each as a 144-D vector

Original Image

41

L2 error and PCA dim

42

PCA compression: 144D ) 60D

50

Looks like the discrete cosine bases of JPG!

60 most important eigenvectors

PCA Shortcomings

57
PCA doesnt know about class labels!

Problematic Data Set for PCA

58

PCA maximizes variance,
independent of class
magenta

FLD attempts to separate classes
green line

PCA vs Fisher Linear Discriminant

A linear classifier attempts to separate classes

PCA versus linear classifiers

Incorrect way: DO NOT do feature selection (or
dimensionality reduction) on the whole dataset, and then
cross-validation

Feature selection and dimensionality reduction on the whole
dataset destroys cross-validation

reduced training set would depend on the validation set
Thus, training is looking at the supposedly unseen data!

Feature Selection and Cross-Validation

Training
set

Validation
set

Reduced
training

set

Reduced
validation

set
Feature selection / dimensionality reduction

Training
set

Validation
set

Correct way: feature selection (or dimensionality reduction)
inside cross-validation, only applied to the training set

Feature sel. / dim. reduc.

Feature Selection and Cross-Validation

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] data mining python algorithm PowerPoint Presentation
$25