Exercise 1: Bayes rule (. Suppose that 5% of competitive athletes use performanceenhancing drugs and that a particular drug test has a 2% false positive rate and a 1.5% false negative rate.
- (3 points) Athlete A tests positive for drug use. What is the probability that Athlete A is using drugs?
- (3 points) Athlete B tests negative for drug use. What is the probability that Athlete B is not using drugs?
Exercise 2: Bayesian decision theory: losses and risks Consider a classification problem with K classes, using a loss ik 0 if we choose class i when the input actually belongs to class k, for i,k {1,,K}.
- (2 points) Write the expression for the expected risk Ri(x) for choosing class i as the class for a pattern x, and the rule for choosing the class for x.
Consider a two-class problem with losses given by the matrix.
- (3 points) Give the optimal decision rule in the form p(C1|x) > as a function of 21.
- (3 points) Imagine we consider both misclassification errors as equally costly. When is class 1 chosen (for what values of p(C1|x))?
- (3 points) Imagine we want to be very conservative when choosing class 2 and we seek a rule of the form p(C2|x) > 0.99 (i.e., choose class 2 when its posterior probability exceeds 99%). What should 21 be?
Exercise 3: association rules . Given the following data of transactions at a supermarket, calculate the support and confidence values of the following association rules: meat avocado, avocado meat, yogurt avocado, avocado yogurt, meat yogurt, yogurt meat. What is the best rule to use in practice?
| transaction # | items in basket |
| 1 | meat, avocado |
| 2 | yogurt, avocado |
| 3 | meat |
| 4 | yogurt, meat |
| 5 | avocado, meat, yogurt |
| 6 | meat, avocado |
Exercise 4: true- and false-positive rates Consider the following table, where xn is a pattern, yn its ground-truth label (1 = positive class, 2 = negative class) and p(C1|xn) the posterior probability produced by some probabilistic classification algorithm:
| n | 1 | 2 | 3 | 4 | 5 |
| yn | 1 | 2 | 2 | 1 | 2 |
| p(C1|xn) | 0.6 | 0.7 | 0.5 | 0.9 | 0.2 |
We use a classification rule of the form p(C1|x) > where [0,1] is a threshold.
- (8 points) Give, for all possible values of [0,1], the predicted labels and the corresponding confusion matrix and classification error.
- (2 points) Plot the corresponding pairs (fp,tp) as an ROC curve.
Exercise 5: ROC curves . Imagine we have a classifier A that has false-positive and true-positive rates fpA,tpA [0,1] such that fpA > tpA (that is, this classifier is below the diagonal on the ROC space). Now consider a classifier B that negates the decision of A, that is, whenever A predicts the positive class then B predicts the negative class and vice versa. Compute the false-positive and true-positive rates fpB,tpB for classifier B. Where is this point in the ROC space?
Exercise 6: least-squares regression (14 points). Consider the following model, with parameters = {1,2,3} R and an input x R:
h(x;) = 1 + 2 sin2x + 3 sin4x R.
- (2 points) Write the general expression of the least-squares error function of a model h(x;) with parameters given a sample .
- (2 points) Apply it to the above model, simplifying it as much as possible.
- (6 points) Find the least-squares estimate for the parameters.
- (4 points) Assume the values are uniformly distributed in the interval [0,2]. Can you find a simpler, approximate way to find the least-squares estimate ? Hint: approximate by an integral.
Exercise 7: maximum likelihood estimate . A discrete random variable x {0,1,2} follows a Poisson distribution if it has the following probability mass function:
where the parameter is > 0.
- (2 points) Verify that
- (2 points) Write the general expression of the log-likelihood of a probability mass function p(x;) with parameters for an iid sample x1,,xN.
- (5 points) Apply it to the above distribution, simplifying it as much as possible.
- (6 points) Find the maximum likelihood estimate for the parameter .
Exercise 8: multivariate Bernoulli distribution Consider a multivariate Bernoulli distribution where [0,1]D are the parameters and x {0,1}D the binary random vector:
.
- (5 points) Compute the maximum likelihood estimate for given a sample X = {x1,,xN}.
Let us do document classification using a D-word dictionary (element d in xn is 1 if word d is in document n and 0 otherwise) using a multivariate Bernoulli model for each class. Assume we have K document classes for which we have already obtained the values of the optimal parameters k = (k1,,kD)T and prior distribution p(Ck) = k, for k = 1,,K, by maximum likelihood.
- (2 points) Write the discriminant function gk(x) for a probabilistic classifier in general (not necessarily Bernoulli), and the rule to make a decision.
- (5 points) Apply it to the multivariate Bernoulli case with K Show that gk(x) is linear on x, i.e., it can be written as gk(x) = wkT x + wk0 and give the expression for wk and wk0.
- (3 points) Consider K = 2 classes. Show the decision rule can be written as if wT x + w0 > 0 then choose class 1, and give the expression for w and w0.
- (5 points) Compute the numerical values of w and w0 for a two-word dictionary where 1 = 0.7,
) and ). Plot in 2D all the possible values of x {0,1}D and the boundary
corresponding to this classifier.
Exercise 9: Gaussian classifiers . Consider a binary classification problem for x RD where we use Gaussian class-conditional probabilities) and
That is, they have the same mean and the covariance matrices are isotropic but different. Compute the expression for the class boundary. What shape is it?

![[Solved] CSE176 HomeWork#1](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip.jpg)

![[Solved] CSE176 Lab#4 -clustering algorithm](https://assignmentchef.com/wp-content/uploads/2022/08/downloadzip-1200x1200.jpg)
Reviews
There are no reviews yet.