[Solved] CSE 802- Pattern Recognition and Analysis-Homework1

$25

File Name: CSE_802-_Pattern_Recognition_and_Analysis-Homework1.zip
File Size: 480.42 KB

SKU: [Solved] CSE 802- Pattern Recognition and Analysis-Homework1 Category: Tag:
5/5 - (1 vote)

Note:

  1. You are permitted to discuss the following questions with others in the class. However, you must write up your own solutions to these questions. Any indication to the contrary will be considered an act of academic dishonesty. Copying from any source constitutes academic dishonesty.
  2. A hard-copy of the homework must be submitted by February 3, 12:40 pm. Late submissions will not be graded. In your submission, please include the names of individuals you discussed this homework with and the list of external resources (e.g., websites, other books, articles, etc.) that you used to complete the assignment (if any).
  3. Code developed as part of this assignment must be included as an appendix to your submission or inline with your solution.
  4. The IMOX dataset consists of 192 8-dimensional patterns pertaining to four classes (digital characters I, M, O and X). There are 48 patterns per class. The 8 features correspond to the distance of a character to the (a) upper left boundary (x1), (b) lower right boundary (x2), (c) upper right boundary (x3), (d) lower left boundary (x4), (e) middle left boundary (x5), (f) middle right boundary (x6), (g) middle upper boundary (x7), and (h) middle lower boundary (x8). Note that the class labels (1, 2, 3 or 4) are indicated at the end of every pattern.
    • [4 points] Compute and report the mean pattern vector, i.e., the centroid, of each class.
    • [4 points] For each class, determine the pattern (i.e., vector) from that class which is the farthest from the class mean. You can use the Euclidean distance metric for this problem.
    • [8 points] For each feature, plot the histograms pertaining to the 4 classes. Your output should contain 8 graphs corresponding to the 8 features; each graph should contain 4 histograms corresponding to the 4 classes (choose a bin size of your choice for the histograms). Based on these plots, indicate (a) the features that are likely to be useful for distinguishing the 4 classes, and (b) the classes that are likely to overlap with each other to a great extent. Provide an explanation for your answer.
    • [5 points] Assume that each pattern can be represented by features x1 and x2. This means, each pattern can be viewed as a point in 2-dimensional space. Draw a scatter plot showing all 192 patterns (use different labels/markers to distinguish between classes). Draw another scatter plot based on features x3 and x4. Based on these scatter plots, explain which of the two feature subsets ((x1, x2) or (x3, x4)) is likely to be useful for separating the 4 classes.
    • [4 points] Assume that each pattern can be represented by features (x1, x2, x4). Draw a 3-dimensional scatter plot showing all 192 patterns. Based on this scatter plot, explain which classes overlap with each other to a great extent.
  5. [10 points] What type of learning scheme supervised, unsupervised, or reinforcement can be used to address each of the following problems. You must justify your answer.
    • Teaching a self-driving rover to navigate the terrain of Mars.
    • Given a large set of unknown flowers, discover categories of flowers based on color, geometry, texture and scent of the flowers.
    • Determining the identity of a person in a video based on their voice.
    • Predicting whether it would rain or not in the next 24 hours based on current weather conditions such as precipitation, humidity, temperature, wind, pressure, etc.
    • Given a large set of photos, group all similar looking faces together.
  6. [15 points] Describe each of the following terms with an example: (a) overfitting, (b) loss function, (c) decision boundary, (d) segmentation, (e) invariant representation.
  7. [20 points] The paper Classification Accuracies of Physical Activities Using Smartphone Motion Sensors by Wu et al. discusses a pattern classification system that determines the physical activity of an individual based on data gleaned from the iPod Touch.
    • Briefly describe this system based on the pattern recognition terminology developed in class: (i) sensors used; (ii) pre-processing of raw data; (iii) features extracted; and (iv) classification scheme. How many features (i.e., d) and classes (i.e., c) are present?
    • How was classifier training accomplished? How many patterns were available in the training set? How were the training patterns labeled?
    • How was the performance of the pattern recognition system evaluated? What metrics were used to evaluate classifier performance?
    • In your opinion, did the proposed pattern recognition system perform well? Why or why not?
  8. [5 points] Consider the following probability density function which is non-zero only in the range 0 x 10: p(x)= K.x3(10 x).

Here, K is a constant. Determine the value of the constant K.

  1. Consider the problem of classifying two-dimensional patterns of the form x =(x1, x2)t into one of two categories, 1 or 2. Using the labeled patterns presented in this data set[1], do the following.
    • [8 points] Plot the histograms (bin size = 1) corresponding to (x1|1) and (x1|2) in a graph. Also, plot the histograms (bin size = 1) corresponding to (x2|1) and (x2|2) in a separate graph. Is x1 more discriminatory than x2?
    • [7 points] Plot the two-dimensional patterns in a graph. Use markers to distinguish the patterns according to their class labels. Suppose you have the following decision rule (classifier) to classify a novel pattern x =(x1, x2)t:

If x1+ x2 15 < 0, x 1 else x 2.

In the same graph, plot the decision boundary corresponding to this rule. What is the error rate (i.e., the percentage of patterns that are misclassified) when this decision rule is used to classify the patterns in the given data set?

  • [7 points] Repeat the above after modifying the decision rule (classifier) as follows:

If x1+ x2 12 < 0, x 1 else x 2.

  • [3 points] Which of the two classifiers has performed well on this dataset.

[1] The text file has 3 columns. The first two columns correspond to the feature vector of a pattern and the third column corresponds to its class label.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSE 802- Pattern Recognition and Analysis-Homework1
$25