CS 4610/5335
Deep Learning and Computer Vision
Robert Platt Northeastern University
Material adapted from:
1. Lawson Wong, CS 5100
Use features (x) to predict targets (y)
Classification
Classification
Targets y are now either: Binary: {0, 1}
Multi-class: {1, 2, , K}
We will focus on binary case (Ex5 Q6 covers multi-class)
2
Classification
Focus: Supervised learning (e.g., regression, classification) Use features (x) to predict targets (y)
Input: Dataset of n samples: {x(i), y(i)}, i = 1, , n
Each x(i) is a p-dimensional vector of feature values
Output: Hypothesis h(x) in some hypothesis class H
H is parameterized by d-dim. parameter vector
Goal: Find the best hypothesis * within H
What does best mean? Optimizes objective function:
J(): Error fn. L(pred, y): Loss fn.
A learning algorithm is the procedure for optimizing J()
3
Dendrite
Cell body
Axon Terminal
Biological neuron
Node of Ranvier
Nucleus
Axon
Schwann cell Myelin sheath
4
Artificial neuron
McCulloch-Pitts model (1943) (fixed weights) Rosenblatt (1957) (learnable weights + bias term) Learning algorithm: Perceptron
5
Artificial neuron
Artificial neuron can represent basic logic gates (assume threshold fires when weighted sum 0)
6
Artificial neural networks
Artificial neuron can represent basic logic gates (assume threshold fires when weighted sum 0)
Artificial neural network (ANN) can represent any logical circuit / function!
7
Artificial neural networks
How do we train a neuron?
8
Artificial neural networks
How do we train a neuron?
Parameters: w (on input links) Hypothesis:Output=g(w0 +w1 a1 ++wp ap )
Objective: Error / loss function between output and target y
9
Artificial neural networks
Objective: Error / loss function between output and target y g = Hard threshold: Perceptron algorithm
Works well for single neurons, but not for networks
10
Artificial neural networks
Objective: Error / loss function between output and target y g = Hard threshold: Perceptron algorithm
Works well for single neurons, but not for networks How about gradient descent? Need smooth g
11
How about gradient descent?
Need smooth g
Many choices of activation functions!
Artificial neural networks
12
How about gradient descent?
Need smooth g
Many choices of activation functions!
Most popular: Rectified linear unit (ReLU)
Artificial neural networks
13
How about gradient descent?
Need smooth g
Many choices of activation functions!
Most popular: Rectified linear unit (ReLU)
We will consider sigmoid (logistic)
Artificial neural networks
14
Artificial neural networks
Parameters: w (on input links) Hypothesis:Output=(w0 +w1 a1 ++wp ap )
Seem familiar?
15
Artificial neural networks
Parameters: w (on input links) Hypothesis:Output=(w0 +w1 a1 ++wp ap )
Seem familiar? Logistic regression = learning single neuron
16
Artificial neural networks
Input: x (x0 = bias) Parameters: w
Weighted input: z11 (sum w*x) Activation function: (sigmoid) Activation: a11 = (z11)
Prediction = a11
17
Artificial neural networks
Assume squared-error loss Compute gradient, perform SGD
18
Artificial neural networks
Input: x (x0 = bias) Parameters: v (layer 1), w (layer 2)
Weighted input: z21 (sum w*x) Activation function: (sigmoid) Activation: a21 = (z21)
Prediction = a21
19
Artificial neural networks
Assume squared-error loss Compute gradient, perform SGD
20
Artificial neural networks
21
Artificial neural networks
22
Artificial neural networks
23
Artificial neural networks
Underlined terms are the same!
24
Artificial neural networks
Underlined terms are the same!
They will appear in every gradient term in all layers
Avoid recomputing this term
25
Artificial neural networks
Underlined terms are the same!
They will appear in every gradient term in all layers
Avoid recomputing this term: Key idea of backpropagation
Bryson & Ho (1969)
Linnainmaa (1970)
Werbos (1974)
Rumelhart, Hinton, Williams (1986)
26
Artificial neural networks
Underlined terms are the same!
They will appear in every gradient term in all layers
Avoid recomputing this term: Key idea of backpropagation
Learning with backprop
= using gradient descent to learn neural networks,
where gradients are computed efficiently
27
Artificial neural networks
28
Artificial neural networks
Backpropagation Forward pass:
Compute activations (a) Backward pass:
Compute errors ()) Adjust weights
29
Convolutional layers
Deep multi-layer perceptron networks general purpose
involve huge numbers of weights
We want:
special purpose network for image and NLP data fewer parameters
fewer local minima
Answer: convolutional layers!
Convolutional layers
Image
stride Filter size
pixels
Convolutional layers
All of these weight groupings are tied to each other
Image
stride Filter size
pixels
Convolutional layers
All of these weight groupings are tied to each other
Image
stride Filter size
pixels
Because of the way weights are tied together
reduces number of parameters (dramatically) encodes a prior on structure of data
In practice, convolutional layers are essential to computer vision
Convolutional layers Two dimensional example:
Why do you think they call this convolution?
Think-pair-share
What would the convolved feature map be for this kernel?
Example: MNIST digit classification with LeNet
MNIST dataset: images of 10,000 handwritten digits Objective: classify each image as the corresponding digit
Example: MNIST digit classification with LeNet LeNet:
two convolutional layers two fully connected layers conv, relu, pooling relu
last layer has logistic activation function
Example: MNIST digit classification with LeNet Load dataset, create train/test splits
Example: MNIST digit classification with LeNet Define the neural network structure:
Input
Conv1
Conv2
FC1 FC2
Example: MNIST digit classification with LeNet
Train network, classify test set, measure accuracy
notice we test on a different set (a holdout set) than we trained on
Using the GPU makes a huge differece
Deep learning packages
You dont need to use Matlab (obviously) Tensorflow is probably the most popular platform Caffe and Theano are also big
Another example: image classification w/ AlexNet
ImageNet dataset: millions of images of objects
Objective: classify each image as the corresponding object (1k categories in ILSVRC)
Another example: image classification w/ AlexNet
AlexNet has 8 layers: five conv followed by three fully connected
Another example: image classification w/ AlexNet
AlexNet has 8 layers: five conv followed by three fully connected
Another example: image classification w/ AlexNet
AlexNet won the 2012 ILSVRC challenge sparked the deep learning craze
Reviews
There are no reviews yet.