5/5 - (1 vote)

Which of the following set of x R³can be shattered by the 3D perceptron hypothesis set? The set contains all hyperplanes of the form with our usual notation of x₀= 1:

Choose the correct answer; explain your answer.

{(7,8,9),(17,18,19),(27,28,29)}
{(1,1,1),(7,8,9),(15,16,17),(21,23,25)}
{(1,1,3),(7,8,9),(15,16,17),(21,23,25)}
{(1,3,5),(7,8,9),(15,16,17),(21,23,25)}
{(1,2,3),(4,5,6),(7,8,9),(15,16,17),(21,23,25)}

What is the growth function of axis-aligned perceptrons in 2D for N 4? Those perceptrons are all perceptrons with w₁w₂= 0. That is, they are vertical or horizontal lines on the 2D plane. Choose the correct answer; explain your answer.
- 4N + 4
- 4N + 2 [c] 4N

[d] 4N 2 [e] 4N 4

What is the VC dimension of positively-biased perceptrons in 2D? The positively-biased perceptrons are all perceptrons with w₀> Choose the correct answer; explain your answer.
- 0
- 1
- 2
- 3
- 4

Ring Hypothesis Set

Name: [Solved] ML Homework2-Perceptrons
Brand: Assignment Chef
SKU: [Solved] ML Homework2-Perceptrons
Price: 25 USD
Availability: InStock
Rating: 5 (1 reviews)

The ring hypothesis set in R³contains hypothesis parameterized by two positive numbers a and b, where

+1 if

(x) =

1 otherwise.

What is the growth function of the hypothesis set? Choose the correct answer; explain your answer.

[e] none of the other choices

Following the previous problem, what is the VC dimension of the ring hypothesis set? Choose the correct answer; explain your answer.
- 1
- 2
- 3
- 6
- none of the other choices

Deviation from Optimal Hypothesis

In Lecture 7, the VC bound was stated from the perspective of g, the hypothesis picked by the learning algorithm. The bound itself actually quantifies the BAD probability from any hypothesis h in the hypothesis set. That is,

P[h H s.t. .

Define the best-E_inhypothesis g = argmin_h_HE_in(h)

and the best-E_outhypothesis (which is optimal but can only be obtained by a cheating algorithm)

g = argminhHEout(h).

Using the VC bound above, with probability more than 1 , which of the following is an upper bound of E_out(g) E_out(g)? Choose the correct answer; explain your answer.

The VC Dimension

For a finite hypothesis set H = {h₁,h₂,,h_M}, where each hypothesis is a binary classifier from X to {1,+1}, what is the largest possible value of d_vc(H)? Choose the correct answer; explain your answer.
- M
- 2M [c] M²
- blog₂Mc
- 2^M
A boolean function h: {1,+1}^k {1,+1} is called symmetric if its value does not depend on the permutation of its inputs, i.e., its value only depend on the number of ones in the input. What is the VC dimension of the set of all symmetric boolean functions? Choose the correct answer; explain your answer.
- k 2
- k 1
- k
- k + 1
- k + 2
How many of the following are necessary conditions for d_vc(H) = d? Choose the correct answer; state which conditions correspond your answer and explain them.
- some set of d distinct inputs is shattered by H
- some set of d distinct inputs is not shattered by H
- any set of d distinct inputs is shattered by H
- any set of d distinct inputs is not shattered by H
- some set of d + 1 distinct inputs is shattered by H
- some set of d + 1 distinct inputs is not shattered by H
- any set of d + 1 distinct inputs is shattered by H
- any set of d + 1 distinct inputs is not shattered by H
- 1
- 2
- 3
- 4
- 5
Which of the following hypothesis set is of VC dimension ? Choose the correct answer; explain your answer.
- the rectangle family: the infinite number of hypotheses where the boundary between 1 regions of each hypothesis looks like a rectangle (including axis-aligned ones and rotated ones) for x R²
- the intersected-interval family: the infinite number of hypotheses where the positive region of each hypothesis can be represented as an intersection of any finite number of positive intervals for x R
- the sine family: the infinite number of hypotheses {h: h(x) = sign(sin( x))} for x R
- the scaling family: the infinite number of hypothesis for x R^d
- none of the other choices

Noise and Error

Consider a binary classification problem where we sample (x,y) from a distribution P with y {1,+1}. Now we define a distribution Pto be a noisy version of P. That is, to sample from P, we first sample (x,y) from P and flip y to y with probability Note that P₀= P. The distribution Pmodels a situation that our training data is labeled by an unreliable human, who mislabels with probability .

Define E_out(h,) to be the out-of-sample error of h with respect to P. That is, Eout(h,) = E(x,y)P h(x) 6= y .

J K

Which of the following relates E_out(h,) to E_out(h,0)? Choose the correct answer; explain your answer.

Consider x R³and a target function f(x) = argmax_i_=1,2,3x_i, with ties broken, if any, by choosing the smallest i. Then, assume a process that generates (x,y) by a uniform P(x) within [0,1]³and

0.7 y = f(x)

x) mod 3 + 1

0.2 y = ( (x) + 1) mod 3 + 1

The operation of a mod 3 returns the residual when the integer a is divided by 3. When using the squared error, what is E_out(f) subject to the process above? Choose the correct answer; explain your answer. (Note: This is in some sense the price of noise)

0.3
0.6
0.9
1.2 [e] 1.5

Following Problem 12, the squared error defines an ideal target function

as shown on page 11 of the Lecture 8 slides. Unlike the slides, however, we denote this function as fto avoid being confused with the target function f used for generating the data. Define the squared difference between f and fto be

(f,f) = Ex_P(x)(f(x) f(x))².

What is the value of (f,f)? Choose the correct answer; explain your answer. (Note: This means how much the original target function f was dragged by the noise in P(y|x) to the new target function f.)

0.01
0.14
0.16
0.25
0.42

Decision Stump

In page 22 of the Lecture 5 slides (the Fun Time that you should play by yourself), we taught about the learning model of positive and negative rays (which is simply one-dimensional perceptron). The model contains hypotheses of the form:

h_s,(x) = s sign(x ),

where s {1,+1} is the direction of the ray and R is the threshold. You can take sign(0) = 1 for simplicity. The model is frequently named the decision stump model and is one of the simplest learning models. As shown in class, the growth function of the model is 2N and the VC Dimension is 2.

When using the decision stump model, given 1 and = 0.1, among the five choices, what is the smallest N such that the BAD probability of the VC bound (as given in the beginning of Problem 6) is ? Choose the correct answer; explain your answer.
- 6000
- 8000
- 10000
- 12000
- 14000

In fact, the decision stump model is one of the few models that we could minimize E_inefficiently by enumerating all possible thresholds. In particular, for N examples, there are at most 2N dichotomies (see page 22 of the Lecture 5 slides), and thus at most 2N different E_invalues. We can then easily choose the hypothesis that leads to the lowest E_inby the following decision stump learning algorithm.

(1) sort all N examples x_nto a sorted sequence such that(2) for each 1 and x⁰_i6= x⁰_i₊₁} and s {1,+1},calculate E_in(h_s,)(3) return the h_s,with the minimum E_inas g; if multiple hypotheses reach the minimum E_in, return the one with the smallest s + .(Hint: CS-majored students are encouraged to think about whether the second step can be carried out efficiently, i.e. O(N), using dxxxxxc pxxxxxxxxxg instead of the naive implementation of O(N²).)

Next, you are asked to implement such an algorithm and run your program on an artificial data set. We shall start by generating (x,y) with the following procedure. We will take the target function f(x) = sign(x):

Generate x by a uniform distribution in [1,+1].
Generate y from x by y = f(x) and then flip y to y with probability independently

For [1,+1], what is E_out(h_+1,,0), where E_out(h,) is defined in Problem 11? Choose the correct answer; explain your answer.
(*) For = 0, which means that your data is noiseless. Generate a data set of size 2 by the procedure above and run the decision stump algorithm on the data set to get g. Repeat the experiment 10000 times, each with a different data set. What is the mean of E_out(g,) E_in(g) within the 10000 results? Choose the closest value. (By extending the results in Problem 11 and Problem 15, you can actually compute any E_out(h_s,,) But if you do not trust your math derivation, you can get a very accurate estimate of E_out(g) by evaluating g on a separate test data set of size 100000, as guaranteed by Hoeffdings inequality).
- 0.00
- 0.02
- 0.05
- 0.30
- 0.40
(*) For = 0, generate a data set of size 20 by the procedure above and run the decision stump algorithm on the data set to get g. Repeat the experiment 10000 times, each with a different data set. What is the mean of E_out(g,) E_in(g) within the 10000 results? Choose the closest value.
- 0.00
- 0.02
- 0.05
- 0.30
- 0.40
(*) For = 0.1, generate a data set of size 2 by the procedure above and run the decision stump algorithm on the data set to get g. Repeat the experiment 10000 times, each with a different data set. What is the mean of E_out(g,) E_in(g) within the 10000 results?
- 0.00
- 0.02
- 0.05
- 0.30
- 0.40
(*) For = 0.1, generate a data set of size 20 by the procedure above and run the decision stump algorithm on the data set to get g. Repeat the experiment 10000 times, each with a different data set. What is the mean of E_out(g,) E_in(g) within the 10000 results? Choose the closest value.
- 0.00
- 0.02
- 0.05
- 0.30
- 0.40
(*) For = 0.1, generate a data set of size 200 by the procedure above and run the decision stump algorithm on the data set to get g. Repeat the experiment 10000 times, each with a different data set. What is the mean of E_out(g,) E_in(g) within the 10000 results? Choose the closest value.
- 0.00
- 0.02
- 0.05nbz
- 0.30
- 0.40

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] ML Homework2-Perceptrons

Ring Hypothesis Set

Deviation from Optimal Hypothesis

The VC Dimension

Noise and Error

Decision Stump

Reviews

Related products

[Solved] ML Homework1-Regularized linear model regression and visualization

[Solved] ML Assignment3-SVM & ANN

[Solved] ML Exercise1-Linear Regression

[Solved] ML Homework 3- Tensorflow Neural Network

[Solved] ML Assignment1-Distance Weighted Nearest Neighbour Algorithm for Regression

[Solved] Machine-Leanrning-Project2