5/5 - (1 vote)

LECTURE 5 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE

MSIN0097
Individual coursework

MSIN0097
Individual Coursework assignment has been extended by one week
to Friday 5th March 2021 at 10:00 am

USING OTHER PEOPLES CODE

pic.twitter.com/4q4IbLgEB8

Wojciech Zaremba (@woj_zaremba) February 4, 2021

MACHINE LEARNING JARGON
Model
Interpolating / Extrapolating Data Bias
Noise / Outliers
Learning algorithm
Inference algorithm
Supervised learning
Unsupervised learning
Classification
Regression
Clustering
Decomposition
Parameters
Optimisation
Training data
Testing data
Error metric
Linear model
Parametric model
Model variance
Model bias
Model generalization
Overfitting
Goodness-of-fit
Hyper-parameters
Failure modes
Confusion matrix
True Positive
False Negative
Partition
Margin
Data density
Hidden parameter
High dimensional space
Low dimensional space
Separable data
Manifold / Decision surface
Hyper cube / volume / plane

A B C- D ALGORITHMIC APPROACHES
A. ClAssification
C. Clustering
Hidden variables
Density estimation Manifolds
B. Regression
Super vised
D. Decomposition
Subspaces
Unsuper vised

QUES TIONS
How would I know if my data will be benefitted from a transformation to a higher or lower dimensional space?

CURSE OF DIMENSIONALITY
https://www.nature.com/articles/s41592-018-0019-x

QUES TIONS
Would I always have to visualize the data at a 2D or 3D level to visually understand if the data can be better separable? (but then this would defeat the idea of going a higher dimensional space which cant be visualized).

SUMMARY STATISTICS
Anscombes quartet

SUMMARY STATISTICS
https://seaborn.pydata.org/examples/scatterplot_matrix.html

QUES TIONS?
Should I have to go all the way through modelling (e.g. classification) and evaluate a metric such as the Gini coefficient and then go back to comparing different Gini scores from (addition of) extra dimensions?

QUES TIONS?
I understand that it might be better to go up a dimension in certain cases and other cases it will be better to go lower a dimension?

MULTIPLE MODELS

MSIN0097
K-means

K-MEANS LLOYDFORGY ALGORITHM

K-MEANS
Advantages Disadvantages

ELLIPSOIDAL DISTRIBUTED DATA

MSIN0097
Gaussian mixtures

PARTITIONAL

MIXTURE OF GAUSSIANS (1D)

HIDDEN (LATENT) VARIABLES

MIXTURE OF GAUSSIANS (2D)

GRAPHICAL MODELS GAUSSIAN MIXTURES

PLATE NOTATION
including its parameters (squares, solid circles, bullet) random variables (circles)
conditional dependencies (solid arrows)

FAMILIES OF MODELS
Gaussian mixture T-distribution mixture Factor Analysis

TWO STEP EM ALGORITHM

EM ALGORITHM

EXPECTATION MAXIMIZATION

MIXTURE OF GAUSSIANS AS MARGINALIZATION

E-S TEP

M-S TEP

EM ALGORITHM

EXPECTATION MAXIMIZATION

MANIPULATING THE LOWER BOUND

LOCAL MAXIMA
Repeated fitting of mixture of Gaussians model with different starting points results in different models as the fit converges to different local maxima.
Log likelihoods are a) 98.76 b) 96.97 c) 94.35, respectively, indicating that (a) is the best fit.

COVARIANCE COMPONENTS
a) Full covariances.
b) Diagonal covariances.
c) Identical diagonal covariances.

LEARNING GMM PSEUDO CODE

ANOMALY DETECTION

BIC AND AIC

GAUSSIAN MIXTURES

BAYESIAN GMMS

CONCENTRATION PRIORS
The more data we have, however, the less the priors matter. In fact, to plot diagrams with such large differences, you must use very strong priors and little data.

TWO MOONS DATA

PROBLEMS WITH MULTI-VARIATE NORMAL DENSITY

MSIN0097
Types of models

GENERATIVE VS DISCRIMINATIVE

CLASSIFICATION (DISCRIMINATIVE)
LOGISTIC REGRESSION REVISITED
MODEL CONTINGENCY OF THE WORLD ON DATA
World state: Linear model Bernoulli distribution
Probability / Decision surface

CLASSIFICATION (GENERATIVE)
GAUSSIAN MIXTURE
MODEL CONTINGENCY OF DATA ON THE WORLD

WHAT SORT OF MODEL SHOULD WE USE?

WHAT SORT OF MODEL SHOULD WE USE? TL;DR NO DEFINITIVE ANSWER
Inference is generally simpler with discriminative models.
Generative models calculate this probability via Bayes rule, and sometimes this requires a computationally expensive algorithm.
Generative models might waste modelling power.
The data are generally of much higher dimension than the world, and modelling it is costly. Moreover, there may be many aspects of the data which do not influence the state;
Using discriminative approaches, it is harder to exploit this knowledge: essentially we have to re-learn these phenomena from the data.
Sometimes parts of the training or test data vector x may be missing. Here, generative models are preferred.
It is harder to impose prior knowledge in a principled way in discriminative models.

SUMMARY OF APPROACHES

MSIN0097
Best practice

BEST PRACTICE

BEST PRACTICE
Source: https://www.marekrei.com/blog/ml-and-nlp-publications-in-2019/
Percentage of papers mentioning GitHub (indicating that the code is made available):
ACL 70%, EMNLP 69%, NAACL 68% ICLR 56%, NeurIPS 46%, ICML 45%, AAAI 31%.
It seems the NLP papers are releasing their code much more freely.

PAPERS WITH CODE
https://paperswithcode.com/

PERCEPTIONS OF PROBABILITY

DEPLO YMEN T

@SOCIAL
@chipro @random_forests @zachar ylipton @yudapearl @svpino @jackclarkSF

TEACHING TEAM
Dr Alastair Moore Senior Teaching Fellow
[email protected]
@latticecut
Kamil Tylinski Teaching Assistant
[email protected]
Jiangbo Shangguan Teaching Assistant
[email protected]
Individual Coursework workshop
to Thursday 11th Feb 2021 at 12:00 am

LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[SOLVED] CS GMM Bayesian algorithm LECTURE 5 TERM 2:

Reviews

Whatsapp Us

[SOLVED] CS GMM Bayesian algorithm LECTURE 5 TERM 2:

Reviews

Related products

[SOLVED] SciCalculator

[Solved] Program that has three functions: sepia(), remove_all_red(), and gray_scale()

[Solved] Program that reads in the file climate_data_2017_numeric.csv

[Solved] Car.py

[Solved] Payroll calculation program-Python

[Solved] Program6_1.py