Exercise 1: Bayes rule (. Suppose that 5% of competitive athletes use performanceenhancing drugs and that a particular drug test has a 2% false positive rate and a 1.5% false negative rate.
- (3 points) Athlete A tests positive for drug use. What is the probability that Athlete A is using drugs?
- (3 points) Athlete B tests negative for drug use. What is the probability that Athlete B is not using drugs?
Exercise 2: Bayesian decision theory: losses and risks Consider a classification problem with K classes, using a loss ik 0 if we choose class i when the input actually belongs to class k, for i,k {1,,K}.
- (2 points) Write the expression for the expected risk Ri(x) for choosing class i as the class for a pattern x, and the rule for choosing the class for x.
Consider a two-class problem with losses given by the matrix.
- (3 points) Give the optimal decision rule in the form p(C1|x) > as a function of 21.
- (3 points) Imagine we consider both misclassification errors as equally costly. When is class 1 chosen (for what values of p(C1|x))?
- (3 points) Imagine we want to be very conservative when choosing class 2 and we seek a rule of the form p(C2|x) > 0.99 (i.e., choose class 2 when its posterior probability exceeds 99%). What should 21 be?
Exercise 3: association rules . Given the following data of transactions at a supermarket, calculate the support and confidence values of the following association rules: meat avocado, avocado meat, yogurt avocado, avocado yogurt, meat yogurt, yogurt meat. What is the best rule to use in practice?
transaction # | items in basket |
1 | meat, avocado |
2 | yogurt, avocado |
3 | meat |
4 | yogurt, meat |
5 | avocado, meat, yogurt |
6 | meat, avocado |
Exercise 4: true- and false-positive rates Consider the following table, where xn is a pattern, yn its ground-truth label (1 = positive class, 2 = negative class) and p(C1|xn) the posterior probability produced by some probabilistic classification algorithm:
n | 1 | 2 | 3 | 4 | 5 |
yn | 1 | 2 | 2 | 1 | 2 |
p(C1|xn) | 0.6 | 0.7 | 0.5 | 0.9 | 0.2 |
We use a classification rule of the form p(C1|x) > where [0,1] is a threshold.
- (8 points) Give, for all possible values of [0,1], the predicted labels and the corresponding confusion matrix and classification error.
- (2 points) Plot the corresponding pairs (fp,tp) as an ROC curve.
Exercise 5: ROC curves . Imagine we have a classifier A that has false-positive and true-positive rates fpA,tpA [0,1] such that fpA > tpA (that is, this classifier is below the diagonal on the ROC space). Now consider a classifier B that negates the decision of A, that is, whenever A predicts the positive class then B predicts the negative class and vice versa. Compute the false-positive and true-positive rates fpB,tpB for classifier B. Where is this point in the ROC space?
Exercise 6: least-squares regression (14 points). Consider the following model, with parameters = {1,2,3} R and an input x R:
h(x;) = 1 + 2 sin2x + 3 sin4x R.
- (2 points) Write the general expression of the least-squares error function of a model h(x;) with parameters given a sample .
- (2 points) Apply it to the above model, simplifying it as much as possible.
- (6 points) Find the least-squares estimate for the parameters.
- (4 points) Assume the values are uniformly distributed in the interval [0,2]. Can you find a simpler, approximate way to find the least-squares estimate ? Hint: approximate by an integral.
Exercise 7: maximum likelihood estimate . A discrete random variable x {0,1,2} follows a Poisson distribution if it has the following probability mass function:
where the parameter is > 0.
- (2 points) Verify that
- (2 points) Write the general expression of the log-likelihood of a probability mass function p(x;) with parameters for an iid sample x1,,xN.
- (5 points) Apply it to the above distribution, simplifying it as much as possible.
- (6 points) Find the maximum likelihood estimate for the parameter .
Exercise 8: multivariate Bernoulli distribution Consider a multivariate Bernoulli distribution where [0,1]D are the parameters and x {0,1}D the binary random vector:
.
- (5 points) Compute the maximum likelihood estimate for given a sample X = {x1,,xN}.
Let us do document classification using a D-word dictionary (element d in xn is 1 if word d is in document n and 0 otherwise) using a multivariate Bernoulli model for each class. Assume we have K document classes for which we have already obtained the values of the optimal parameters k = (k1,,kD)T and prior distribution p(Ck) = k, for k = 1,,K, by maximum likelihood.
- (2 points) Write the discriminant function gk(x) for a probabilistic classifier in general (not necessarily Bernoulli), and the rule to make a decision.
- (5 points) Apply it to the multivariate Bernoulli case with K Show that gk(x) is linear on x, i.e., it can be written as gk(x) = wkT x + wk0 and give the expression for wk and wk0.
- (3 points) Consider K = 2 classes. Show the decision rule can be written as if wT x + w0 > 0 then choose class 1, and give the expression for w and w0.
- (5 points) Compute the numerical values of w and w0 for a two-word dictionary where 1 = 0.7,
) and ). Plot in 2D all the possible values of x {0,1}D and the boundary
corresponding to this classifier.
Exercise 9: Gaussian classifiers . Consider a binary classification problem for x RD where we use Gaussian class-conditional probabilities) and
That is, they have the same mean and the covariance matrices are isotropic but different. Compute the expression for the class boundary. What shape is it?
Reviews
There are no reviews yet.