Machine Learning 10-601
Tom M. Mitchell
Machine Learning Department Carnegie Mellon University
April 15, 2015
Today:
Artificial neural networks
Backpropagation
Recurrent networks
Convolutional networks
Deep belief networks
Deep Boltzman machines
Reading:
Mitchell: Chapter 4
Bishop: Chapter 5
Quoc Le tutorial:
Ruslan Salakhutdinov tutorial: 
Artificial Neural Networks to learn f: XY
fmightbenon-linearfunction
X(vectorof)continuousand/ordiscretevars
Y(vectorof)continuousand/ordiscretevars
Representfbynetworkoflogisticunits
Eachunitisalogisticfunction
MLE:trainweightsofallunitstominimizesumofsquared errors of predicted network outputs
MAP:traintominimizesumofsquarederrorsplusweight magnitudes 
ALVINN [Pomerleau 1993]
M(C)LE Training for Neural Networks
Considerregressionproblemf:XY,forscalarY
y = f(x) +  deterministic
assume noise N(0,), iid Letsmaximizetheconditionaldatalikelihood
Learned neural network 
MAP Training for Neural Networks
Considerregressionproblemf:XY,forscalarY
y = f(x) +  deterministic
noise N(0,)
Gaussian P(W) = N(0,)
lnP(W) ci wi2 
xd = input
td = target output
od = observed unit output
wi = weight i 
(MLE)
xd = input
td = target output
od = observed unit output
wij =wtfromitoj 
w0 left strt right up
Semantic Memory Model Based on ANNs [McClelland & Rogers, Nature 2003]
No hierarchy given.
Train with assertions, e.g., Can(Canary,Fly) 
Training Networks on Time Series
Supposewewanttopredictnextstateofworld
and it depends on history of unknown length
e.g., robot with forward-facing sensors trying to predict next sensor reading as it moves and turns 
Recurrent Networks: Time Series
Supposewewanttopredictnextstateofworld
and it depends on history of unknown length
e.g., robot with forward-facing sensors trying to predict next sensor reading as it moves and turns
Idea:usehiddenlayerinnetworktocapturestatehistory 
Recurrent Networks on Time Series
How can we train recurrent net?? 
Convolutional Neural Nets for Image Recognition
[Le Cun, 1992]
specializedarchitecture:mixdifferenttypesofunits,not completely connected, motivated by primate visual cortex
manysharedparameters,stochasticgradienttraining
very successful! now many specialized architectures for vision, speech, translation,  
Deep Belief Networks
[Hinton & Salakhutdinov, 2006]
Problem:trainingnetworkswithmanyhiddenlayers doesnt work very well
local minima, very slow training if initialize with zero weights Deepbeliefnetworks
autoencoder networks to learn low dimensional encodings
but more layers, to learn better encodings 
Deep Belief Networks
[Hinton & Salakhutdinov, 2006]
original image
reconstructed from 2000-1000-500-30 DBN
reconstructed from 2000-300, linear PCA
versus 
Deep Belief Networks: Training
[Hinton & Salakhutdinov, 2006] 
Encoding of digit images in two dimensions
[Hinton & Salakhutdinov, 2006]
784-2 linear encoding (PCA) 784-1000-500-250-2 DBNet 
Very Large Scale Use of DBNs
[Quoc Le, et al., ICML, 2012]
Data: 10 million 200200 unlabeled images, sampled from YouTube Training: use 1000 machines (16000 cores) for 1 week
Learned network: 3 multi-stage layers, 1.15 billion parameters
Achieves 15.8% (was 9.5%) accuracy classifying 1 of 20k ImageNet items
Real images that most excite the feature:
Image synthesized to most excite the feature: 
Restricted Boltzman Machine
Bipartite graph, logistic activation
Inference: fill in any nodes, estimate other
nodes
consider vi, hj are boolean variables
h1 h2 h3
v1v2 vn 
Impact of Deep Learning  Speech Recogni4on
 Computer Vision  Recommender Systems
 Language Understanding
 Drug Discovery and Medical Image Analysis
[Courtesy of R. Salakhutdinov] 
Feature Representa4ons: Tradi4onally
Data
Feature extraction
Learning algorithm
Object detec4on
Image
vision features
Recogni4on
Audio classifica4on
Audio
audio features
Speaker iden4fica4on
[Courtesy of R. Salakhutdinov] 
Computer Vision Features
SIFT
Textons
HoG
GIST
RIFT
[Courtesy, R. Salakhutdinov] 
Audio Features
Flux
Spectrogram MFCC
ZCR
Rolloff
[Courtesy, R. Salakhutdinov] 
Audio Features
Representa4on Learning:
Flux
Can we automa4cally learn these representa4ons?
ZCR Rolloff
Spectrogram MFCC
[Courtesy, R. Salakhutdinov] 
Restricted Boltzmann Machines
Graphical Models: Powerful framework for represen4ng dependency structure between random variables.
hidden variables Pair-wise Unary Feature Detectors
Image
visible variables
RBM is a Markov Random Field with:
 Stochas4c binary visible variables
 Stochas4c binary hidden variables
 Bipar4te connec4ons.
Markov random fields, Boltzmann machines, log-linear models.
[Courtesy, R. Salakhutdinov] 
Observed Data
Subset of 25,000 characters
New Image: =
Learned W: edges Subset of 1000 features
Learning Features
Sparse representa8ons
.
Logis4c Func4on: Suitable for modeling binary images
[Courtesy, R. Salakhutdinov] 
Model Learning Hidden units
Given a set of i.i.d. training examples
, we want to learn
model parameters . Maximize log-likelihood objec4ve:
Image
Deriva4ve of the log-likelihood:
visible units
Difficult to compute: exponen4ally many configura4ons
[Courtesy, R. Salakhutdinov] 
RBMs for Real-valued Data
hidden variables
Pair-wise
Unary
Image visible variables Gaussian-Bernoulli RBM:
 Stochas4c real-valued visible variables
 Stochas4c binary hidden variables
 Bipar4te connec4ons.
(Salakhutdinov & Hinton, NIPS 2007; Salakhutdinov & Murray, ICML 2008)
[Courtesy, R. Salakhutdinov] 
RBMs for Real-valued Data
hidden variables
Pair-wise Unary
Image visible variables
4 million unlabelled images
Learned features (out of 10,000)
[Courtesy, R. Salakhutdinov] 
RBMs for Real-valued Data
hidden variables
Pair-wise Unary
Image visible variables
4 million unlabelled images
= 0.9 *
+ 0.8 *
Learned features (out of 10,000)
+ 0.6 * 
New Image
[Courtesy, R. Salakhutdinov] 
RBMs for Word Counts
Pair-wise
Unary
0 0
01 0
P(v,h)=
10@XDXKXF XDXKXF1A exp Wkvkhj + vkbk + hjaj
Z() ij i i i
i=1 k=1 j=1 i=1 k=1 j=1
P (vk = 1|h) =  i
expbk +PF hjWk
i j=1 ij 
PK expbq+PF hjWq q=1 i j=1 ij
Replicated Soemax Model: undirected topic model:
 Stochas4c 1-of-K visible variables.
 Stochas4c binary hidden variables
 Bipar4te connec4ons.
[Courtesy, R. Salakhutdinov] (Salakhutdinov & Hinton, NIPS 2010, Srivastava & Salakhutdinov, NIPS 2012) 
RBMs for Word Counts
Pair-wise
Unary
0 0
01 0
P(v,h)=
10@XDXKXF XDXKXF1A exp Wkvkhj + vkbk + hjaj
Z() ij i i i
i=1 k=1 j=1 i=1 k=1 j=1
P (vk = 1|h) =  i
expbk +PF hjWk
i j=1 ij 
PK expbq+PF hjWq q=1 i j=1 ij
Learned features: topics
Reuters dataset: 804,414 unlabeled newswire stories
Bag-of-Words
russian russia moscow yeltsin soviet
clinton house president bill congress
computer system product soeware develop
trade country import world economy
stock wall street point dow
[Courtesy, R. Salakhutdinov] 
Different Data Modali4es
 Binary/Gaussian/Soemax RBMs: All have binary hidden variables but use them to model different kinds of data.
Binary
Real-valued
0 0 0 1 0
 It is easy to infer the states of the hidden variables:
1-of-K
[Courtesy, R. Salakhutdinov] 
Product of Experts The joint distribu4on is given by:
Marginalizing over hidden variables:
Product of Experts
government auhority power empire pu4n
clinton house president bill congress
bribery corrup4on dishonesty pu4n fraud
Pu4n
oil barrel exxon pu4n drill
stock  wall
street
point
dow
Topics government, corrup4on and oil can combine to give very high probability to a word Pu4n.
(Srivastava & Salakhutdinov, NIPS 2012)
[Courtesy, R. Salakhutdinov] 
Deep Boltzmann Machines
Image
Low-level features: Edges
Built from unlabeled inputs.
Input: Pixels
(Salakhutdinov & Hinton, Neural Computation 2012)
[Courtesy, R. Salakhutdinov] 
Deep Boltzmann Machines
Learn simpler representa4ons, then compose more complex ones
Higher-level features: Combina4on of edges
Low-level features: Edges
Built from unlabeled inputs.
Input: Pixels
Image
(Salakhutdinov 2008, Salakhutdinov & Hinton 2012)
[Courtesy, R. Salakhutdinov] 
h3
h2
h1
v
Model Formula4on
Same as RBMs
model parameters
Dependencies between hidden variables.
W3
requires approximate inference to
All connec4ons are undirected. Bolom-up and Top-down:
W2
train, but it can be done
W1
and scales to millions of examples
Input
Top-down Bolom-up
[Courtesy, R. Salakhutdinov] 
Samples Generated by the Model Training Data Model-Generated Samples
Data
[Courtesy, R. Salakhutdinov] 
Handwri4ng Recogni4on
MNIST Dataset
60,000 examples of 10 digits
Op4cal Character Recogni4on 42,152 examples of 26 English lelers
Logis4c regression 22.14% K-NN 18.92%
Learning Algorithm
Error
Learning Algorithm
Error
Logis4c regression
K-NN
Neural Net (Plal 2005)
SVM (Decoste et.al. 2002)
Deep Autoencoder (Bengio et. al. 2007)
Deep Belief Net (Hinton et. al. 2006)
DBM
12.0% 3.09% 1.53% 1.40% 1.40%
1.20%
0.95%
Neural Net
SVM (Larochelle et.al. 2009)
Deep Autoencoder (Bengio et. al. 2007)
Deep Belief Net (Larochelle et. al. 2009)
14.62% 9.70% 10.05%
9.68%
Permuta4on-invariant version.
DBM 8.40%
[Courtesy, R. Salakhutdinov] 
3-D object Recogni4on
NORB Dataset: 24,000 examples
Learning Algorithm
Error
Logis4c regression
K-NN (LeCun 2004)
SVM (Bengio & LeCun 2007)
Deep Belief Net (Nair & Hinton 2009)
DBM
22.5% 18.92% 11.6% 9.0%
7.2%
Palern Comple4on
[Courtesy, R. Salakhutdinov] 
Learning Shared Representa4ons Across Sensory Modali4es
Concept
sunset, pacific ocean, baker beach, seashore, ocean
[Courtesy, R. Salakhutdinov] 
A Simple Mul4modal Model  Use a joint binary hidden layer.
 Problem: Inputs have very different sta4s4cal proper4es.
 Difficult to learn cross-modal features.
0 Real-valued 0 0 1 0
1-of-K
[Courtesy, R. Salakhutdinov] 
Mul4modal DBM
Gaussian model
0 Dense, real-valued 0
image features 01 0
Replicated Soemax
Word counts
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
[Courtesy, R. Salakhutdinov] 
Mul4modal DBM
Gaussian model
0 Dense, real-valued 0
image features 01 0
Replicated Soemax
Word counts
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
[Courtesy, R. Salakhutdinov] 
Mul4modal DBM
Gaussian model
Dense, real-valued 0
image features 01 0
Replicated Soemax
Word counts
0
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
[Courtesy, R. Salakhutdinov] 
Mul4modal DBM
Word counts
Bolom-up + Top-down
Gaussian model
0 Dense, real-valued 0
image features 01 0
Replicated Soemax
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
[Courtesy, R. Salakhutdinov] 
Mul4modal DBM
Word counts
Bolom-up + Top-down
Gaussian model
0 Dense, real-valued 0
image features 01 0
Replicated Soemax
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
[Courtesy, R. Salakhutdinov] 
Text Generated from Images
Given
Generated
dog, cat, pet, kilen, puppy, ginger, tongue, kily, dogs, furry
sea, france, boat, mer, beach, river, bretagne, plage, brilany
portrait, child, kid, ritralo, kids, children, boy, cute, boys, italy
Given
Generated
insect, bulerfly, insects, bug, bulerflies, lepidoptera
graffi4, streetart, stencil, s4cker, urbanart, graff, sanfrancisco
canada, nature, sunrise, ontario, fog, mist, bc, morning
[Courtesy, R. Salakhutdinov] 
Given
Generated
portrait, women, army, soldier, mother, postcard, soldiers
obama, barackobama, elec4on, poli4cs, president, hope, change, sanfrancisco, conven4on, rally
water, glass, beer, bolle, drink, wine, bubbles, splash, drops, drop
Text Generated from Images 
Given
water, red, sunset
nature, flower, red, green
blue, green, yellow, colors
chocolate, cake
Retrieved
Images Generated from Text
[Courtesy, R. Salakhutdinov] 
MIR-Flickr Dataset
 1 million images along with user-assigned tags.
sculpture, beauty, stone
d80
nikon, green, light, photoshop, apple, d70
nikon, abigfave, goldstaraward, d80, nikond80
white, yellow, abstract, lines, bus, graphic
food, cupcake, vegan
sky, geotagged, reflec4on, cielo, bilbao, reflejo
anawesomeshot, theperfectphotographer, flash, damniwishidtakenthat, spiritofphotography
Huiskes et. al.
[Courtesy, R. Salakhutdinov] 
Results
 Logis4c regression on top-level representa4on.
 Mul4modal Inputs
Mean Average Precision
Labeled 25K examples
+ 1 Million unlabelled
Learning Algorithm
MAP
Precision@50
Random
0.124
0.124
LDA [Huiskes et. al.]
0.492
0.754
SVM [Huiskes et. al.]
0.475
0.758
DBM-Labelled
0.526
0.791
Deep Belief Net
0.638
0.867
Autoencoder
0.638
0.875
DBM
0.641
0.873
[Courtesy, R. Salakhutdinov] 
Artificial Neural Networks: Summary
Highlynon-linearregression/classification
Hiddenlayerslearnintermediaterepresentations
Potentiallymillionsofparameterstoestimate
Stochasticgradientdescent,localminimaproblems
Deepnetworkshaveproducedrealprogressinmanyfields computer vision
speech recognition
mapping images to text
recommender systems 
Theylearnveryusefulnon-linearrepresentations 

![[SOLVED] CS deep learning algorithm Machine Learning 10-601](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[SOLVED] COP 3223 Program #3: Counting Pez](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
 
 
 
Reviews
There are no reviews yet.