LECTURE 5 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE
MSIN0097
Individual coursework
MSIN0097
Individual Coursework assignment has been extended by one week
to Friday 5th March 2021 at 10:00 am
USING OTHER PEOPLES CODE
Wojciech Zaremba (@woj_zaremba) February 4, 2021
MACHINE LEARNING JARGON
Model
Interpolating / Extrapolating Data Bias
Noise / Outliers
Learning algorithm
Inference algorithm
Supervised learning
Unsupervised learning
Classification
Regression
Clustering
Decomposition
Parameters
Optimisation
Training data
Testing data
Error metric
Linear model
Parametric model
Model variance
Model bias
Model generalization
Overfitting
Goodness-of-fit
Hyper-parameters
Failure modes
Confusion matrix
True Positive
False Negative
Partition
Margin
Data density
Hidden parameter
High dimensional space
Low dimensional space
Separable data
Manifold / Decision surface
Hyper cube / volume / plane
A B C- D ALGORITHMIC APPROACHES
A. ClAssification
C. Clustering
Hidden variables
Density estimation Manifolds
B. Regression
Super vised
D. Decomposition
Subspaces
Unsuper vised
QUES TIONS
How would I know if my data will be benefitted from a transformation to a higher or lower dimensional space?
CURSE OF DIMENSIONALITY
https://www.nature.com/articles/s41592-018-0019-x
QUES TIONS
Would I always have to visualize the data at a 2D or 3D level to visually understand if the data can be better separable? (but then this would defeat the idea of going a higher dimensional space which cant be visualized).
SUMMARY STATISTICS
Anscombes quartet
SUMMARY STATISTICS
https://seaborn.pydata.org/examples/scatterplot_matrix.html
QUES TIONS?
Should I have to go all the way through modelling (e.g. classification) and evaluate a metric such as the Gini coefficient and then go back to comparing different Gini scores from (addition of) extra dimensions?
QUES TIONS?
I understand that it might be better to go up a dimension in certain cases and other cases it will be better to go lower a dimension?
MULTIPLE MODELS
MSIN0097
K-means
K-MEANS LLOYDFORGY ALGORITHM
K-MEANS
Advantages Disadvantages
ELLIPSOIDAL DISTRIBUTED DATA
MSIN0097
Gaussian mixtures
PARTITIONAL
MIXTURE OF GAUSSIANS (1D)
HIDDEN (LATENT) VARIABLES
MIXTURE OF GAUSSIANS (2D)
GRAPHICAL MODELS GAUSSIAN MIXTURES
PLATE NOTATION
including its parameters (squares, solid circles, bullet) random variables (circles)
conditional dependencies (solid arrows)
FAMILIES OF MODELS
Gaussian mixture T-distribution mixture Factor Analysis
TWO STEP EM ALGORITHM
EM ALGORITHM
EXPECTATION MAXIMIZATION
MIXTURE OF GAUSSIANS AS MARGINALIZATION
E-S TEP
M-S TEP
EM ALGORITHM
EXPECTATION MAXIMIZATION
MANIPULATING THE LOWER BOUND
LOCAL MAXIMA
Repeated fitting of mixture of Gaussians model with different starting points results in different models as the fit converges to different local maxima.
Log likelihoods are a) 98.76 b) 96.97 c) 94.35, respectively, indicating that (a) is the best fit.
COVARIANCE COMPONENTS
a) Full covariances.
b) Diagonal covariances.
c) Identical diagonal covariances.
LEARNING GMM PSEUDO CODE
ANOMALY DETECTION
BIC AND AIC
GAUSSIAN MIXTURES
BAYESIAN GMMS
CONCENTRATION PRIORS
The more data we have, however, the less the priors matter. In fact, to plot diagrams with such large differences, you must use very strong priors and little data.
TWO MOONS DATA
PROBLEMS WITH MULTI-VARIATE NORMAL DENSITY
MSIN0097
Types of models
GENERATIVE VS DISCRIMINATIVE
CLASSIFICATION (DISCRIMINATIVE)
LOGISTIC REGRESSION REVISITED
MODEL CONTINGENCY OF THE WORLD ON DATA
World state: Linear model Bernoulli distribution
Probability / Decision surface
CLASSIFICATION (GENERATIVE)
GAUSSIAN MIXTURE
MODEL CONTINGENCY OF DATA ON THE WORLD
WHAT SORT OF MODEL SHOULD WE USE?
WHAT SORT OF MODEL SHOULD WE USE? TL;DR NO DEFINITIVE ANSWER
Inference is generally simpler with discriminative models.
Generative models calculate this probability via Bayes rule, and sometimes this requires a computationally expensive algorithm.
Generative models might waste modelling power.
The data are generally of much higher dimension than the world, and modelling it is costly. Moreover, there may be many aspects of the data which do not influence the state;
Using discriminative approaches, it is harder to exploit this knowledge: essentially we have to re-learn these phenomena from the data.
Sometimes parts of the training or test data vector x may be missing. Here, generative models are preferred.
It is harder to impose prior knowledge in a principled way in discriminative models.
SUMMARY OF APPROACHES
MSIN0097
Best practice
BEST PRACTICE
BEST PRACTICE
BEST PRACTICE
BEST PRACTICE
Source: https://www.marekrei.com/blog/ml-and-nlp-publications-in-2019/
Percentage of papers mentioning GitHub (indicating that the code is made available):
ACL 70%, EMNLP 69%, NAACL 68% ICLR 56%, NeurIPS 46%, ICML 45%, AAAI 31%.
It seems the NLP papers are releasing their code much more freely.
PAPERS WITH CODE
https://paperswithcode.com/
PERCEPTIONS OF PROBABILITY
DEPLO YMEN T
@SOCIAL
@chipro @random_forests @zachar ylipton @yudapearl @svpino @jackclarkSF
TEACHING TEAM
Dr Alastair Moore Senior Teaching Fellow
[email protected]
@latticecut
Kamil Tylinski Teaching Assistant
[email protected]
Jiangbo Shangguan Teaching Assistant
[email protected]
Individual Coursework workshop
to Thursday 11th Feb 2021 at 12:00 am
LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE
Reviews
There are no reviews yet.