Computer Vision
Machine learning basics and recognition
Semester 1
Changjae Oh
Copyright By Assignmentchef assignmentchef
Objectives
To understand machine learning basics for high-level vision problems
Machine learning problems
Slide credit: J. Hays
Machine learning problems
Slide credit: J. Hays
Dimensionality Reduction Principal component analysis (PCA),
PCA takes advantage of correlations in data dimensions t o produce the best possible lower dimensional representa tion based on linear projections (minimizes reconstruction error).
PCA should be used for dimensionality reduction, not for discovering patterns or making predictions. Dont try to as sign semantic meaning to the bases.
Locally Linear Embedding, Isomap, Autoencoder, etc.
Machine learning problems
means clustering
Image Clusters on intensity Clusters on color
Mean shift algorithm
Spectral clustering
Group points based on links in a graph
Visual PageRank
Determining importance by random walk
Whats the probability that you will randomly walk to a given node? Create adjacency matrix based on visual similarity
Edge weights determine probability of transition
C. Oh et al., Probabilistic Correspondence Matching using Random Walk with Restart, BMVC 2012
Machine learning problems
The machine learning framework
Apply a prediction function to a feature representation of the image to
get the desired output:
f( ) = apple
f( ) = tomato f( ) = cow
Slide credit: L. Lazebnik
Machine learning framework
output prediction function Image feature
Training: given a training set of labeled examples {(x1,y1), , (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set
Testing: apply f to a never before seen test example x and output the predicted value y = f(x)
Slide credit: L. Lazebnik 13
Machine learning framework
Training Images
Test Image
Image Features
Image Features
Training Labels
Classifier Training
Trained Classifier
Trained Classifier
Prediction
Raw pixels
Histograms
GIST descriptors CNNs
Learning a classifier
Given some set of features with corresponding labels, learn a function to p redict the labels from the features
Many classifiers to choose from
Neural networks
Naive Bayes
K-nearest neighbour
Bayesian network
Logistic regression
Randomized Forests
Boosted Decision Trees
Deep Convolutional Network
Classifiers: Nearest neighbor
Training exa mples from class 1
Test exa mple
Training exa mples from class 2
f(x) = label of the training example nearest to x
All we need is a distance function for our inputs
No training required!
Slide credit: S. Lazebnik
Classifiers: Linear
Find a linear function to separate the classes: f(x) = sgn(w x + b)
Slide credit: L. Lazebnik 19
Example: Image Classification by K
Classifier
Prediction
Feature Extraction (HOG)
Example: Image Classification by K
Classifier
Prediction
K-NN Classifier
Example: Image Classification by K
Image Feature
Classifier
Prediction
Prediction
Recognition task and supervision
Images in the training set must be annotated with the correct answer that the model is expected to produce
Contains a motorbike
Slide credit: S. Lazebnik
Spectrum of supervision
Computer vision
Supervised Semi-Supervised Unsupervised Reinforcement learning learning learning learning
Spectrum of supervision
Slide credit: S. Lazebnik
Generalisation
How well does a learned model generalise from the data it was trained on to a new test set?
Training set (labels known) Test set (labels unknown)
Generalisation
How well does a learned model generalise from the data it was trained on to a new test set?
EBU7240 Computer Vision
Changjae Oh
Classification
Semester 1, 2021
Overview of recognition tasks
A statistical learning approach
Classic or shallow classification pipeline
Bag of features representation
Classifiers: nearest neighbor, linear, SVM
Verification/Classification
Is this a building?
Adapted from Fei-fei -Li
Where are the people?
Adapted from Fei-fei -Li
Identification
Is this ?
Adapted from Fei-fei -Li
Semantic Segmentation
Adapted from Fei-fei -Li
Object recognition
A collection of related tasks for identifying objects in digital photographs.
Consists of recognizing, identifying, and locating objects within a picture with a given de
gree of confidence.
image classification object detection
semantic segmentation instance segmentation
Image source
Image classification vs Object detection
Image classification
Identifying what is in the image and the associated level of confidence. can be binary label or multi-label classification
Object detection
Localising and classifying one or more objects in an image Object localisation and image classification
Semantic segmentation vs Instance segmentation
Semantic segmentation
Assigning a label to every pixel in the image.
Treating multiple objects of the same class as a single entity
Instance segmentation
Similar process as semantic segmentation, but identifies , for each pixel, the object in stance it belongs to.
Treating multiple objects of the same class as distinct individual objects (or instances) typically, instance segmentation is harder than semantic segmentation
Image classification
The machine learning framework
Apply a prediction function to a feature representation of the image to
get the desired output:
f( ) = apple
f( ) = tomato f( ) = cow
Slide credit: L. Lazebnik
Machine learning framework
output prediction function Image feature
Training: given a training set of labeled examples {(x1,y1), , (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set
Testing: apply f to a never before seen test example x and output the predicted value y = f(x)
Slide credit: S. Lazebnik
Machine learning framework
Training Images
Test Image
Image Features
Image Features
Training Labels
Classifier Training
Trained Classifier
Trained Classifier
Prediction
Classic recognition pipeline
Hand-crafted feature representation
Off-the-shelf trainable classifier
Image Pixels
Trainable classifier
Class label
Feature representation
Classic representation: Bag of features
Representing images as orderless collections of local features
Motivation 1: Part
Various parts of the image are used separately to determine if and wher e an object of interest exists
based models
Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)
Motivation 2: Texture models
Texture is characterised by the repetition of basic elements or textons
Texton histogram
Texton dictionary
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Motivation 3: Bags of words
Orderless document representation:
Frequencies of words from a dictionary Salton & McGill (1983)
Motivation 3: Bags of words
Orderless document representation:
Frequencies of words from a dictionary Salton & McGill (1983)
Motivation 3: Bags of words
Orderless document representation:
Frequencies of words from a dictionary Salton & McGill (1983)
Bag of features: Outline
1. Extract local features
2. Learn visual vocabulary
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of visual words
1. Local feature extraction
Sample patches and extract descriptors
2. Learning the visual vocabulary
Extracted descriptors from the training set
Slide credit:
2. Learning the visual vocabulary
Clustering
2. Learning the visual vocabulary
Visual vocabulary
Clustering
Want to minimize sum of squared Euclidean distances between features xi and their nearest cluster centers mk
means clustering
Algorithm:
Randomly initialize K cluster centers Iterate until convergence:
D(X,M)= (x m )2 ik
Assign each feature to the nearest center
Recompute each cluster center as the mean of all features assigned to it
clusterk pointiin cluster k
Visual vocabularies
Appearance codebook
Source: B. Leibe
Bag of features: Outline
1. Extract local features
2. Learn visual vocabulary
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of visual words
Classic recognition pipeline
Hand-crafted feature representation
Trainable classifier
Nearest Neighbor classifiers Support Vector machines
Image Pixels
Feature representation
Class label
Trainable classifier
Classifiers: Nearest neighbor
Training exa mples from class 1
Test exa mple
Training exa mples from class 2
f(x) = label of the training example nearest to x
All we need is a distance or similarity function for our inputs
No training required!
Functions for comparing histograms
L1 distance: 2
D(h,h )= 1 2
D(h,h )= 1 2
|h(i)h (i)| 12
distance:
2 i=1 (h(i)h(i))
Quadratic distance (cross-bin distance): D(h,h )=A (h(i)h (j))2
Histogram intersection (similarity function):
I(h,h)=N min(h(i),h(i))
h (i) + h (i) 12
For a new point, find the k closest points from training data
nearest neighbor classifier
Vote for class label with labels of the k points
nearest neighbor classifier
Which classifier is more robust to outliers?
Credit:, http://cs231n.github.io/classification/
nearest neighbor classifier
Credit:, http://cs231n.github.io/classification/
Linear classifiers
Find a linear function to separate the classes: f(x) = sgn(w x + b)
Visualizing linear classifiers
Example learned weights at the end of learning for CIFAR-10.
Credit:, http://cs231n.github.io/classification/
Nearest neighbor vs. linear classifiers
NN pros:
Simple to implement
Decision boundaries not necessarily linear Works for any number of classes Nonparametric method
NN cons:
Need good distance function Slow at test time
Linear pros:
Low-dimensional parametric representation Very fast at test time
Linear cons:
Works for two classes
How to train the linear function? What if data is not linearly separable?
Linear classifiers
When the data is linearly separable, there may be more than one separa tor (hyperplane)
Which separator is the best?
Support vector machines
Find a hyperplane that maximizes the margin between the positive and negative examples
xi positive(yi =1): xi w+b1 xi negative(yi =1): xi w+b1
xi w+b=1
|xi w+b| ||w||
Therefore, the margin is 2 / ||w|| Margin
For support vectors,
Distance between point and hyperplane:
Support vectors
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
Finding the maximum margin hyperplane
1. Maximize margin 2 / ||w||
2. Correctly classify all training data:
xi positive(yi =1): xi w+b1 xi negative(yi =1): xi w+b1
Quadratic optimization problem:
min 12 w 2 subject to yi (w xi + b) 1 w,b
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
SVM parameter learning
min12 w2 subjectto yi(wxi +b)1 Non-separable data:
min 2 w +C max(0,1-yi(wxi +b))
Separable data:
Maximize margin
Classify training data correctly
Maximize margin
Minimize classification mistakes
SVM parameter learning
min 2 w +C max(0,1-yi(wxi +b))
Demo: http://cs.stanford.edu/people/karpathy/svmjs/demo
Nonlinear SVMs
General idea: the original input space can always be mapped to some hi gher-dimensional feature space where the training set is separable
Nonlinear SVMs
Linearly separable dataset in 1D:
Non-separable dataset in 1D:
We can map the data to a higher-dimensional space: x2
SVMs: Pros and cons
Non-linear SVM framework is very powerful, flexible
Training is convex optimization, globally optimal solution can be found SVMs work very well in practice, even with very small training sample sizes
No direct multi-class SVM, must combine two-class SVMs (e.g., with one-vs-others) Computation, memory (esp. for nonlinear SVMs)
CS: assignmentchef QQ: 1823890830 Email: [email protected]
Reviews
There are no reviews yet.