[Solved] CSCI567 Project 4 Part1-Hidden Markov Models

30 $

File Name: CSCI567_Project_4_Part1-Hidden_Markov_Models.zip
File Size: 414.48 KB

SKU: [Solved] CSCI567 Project 4 Part1-Hidden Markov Models Category: Tag:

Or Upload Your Assignment Here:


Hidden Markov Models

In this programming assignment, we will implement Hidden Markov Models (HMM) and apply HMM to Part-of-Speech Tagging problem.

You need to first implement 6 important functions that help us accomplish various steps involved in learning HMM. Then you will be calculating HMM parameters from the data and use it to solve Part-of-Speech Tagging problem.

After finishing the implementation, you can use hmm_test_script.py to test the correctness of your functions.

There are 5 sets of grading data. Each grading data includes paramaters (pi, A, B, obs_dict, state_dict, Osequence) , which we will use to initialize HMM class and test your functions. To receive full credits, your output of function 1-5 should within an error of 10^{-8}, your output of viterbi function should be identical with ours.

For the definitions of the parameters pi, A, B etc., please refer to the code documentation.

1.2 Application to Part-of-Speech Tagging

  1. model_training – 10 = 10x(your_correct_pred_cnt/our_correct_pred_cnt)
  2. speech_tagging – 10 = 10x(your_correct_pred_cnt/our_correct_pred_cnt)

We will use the dataset given to you for grading this part (with a different random seed). We will train your model and our model on same train_data. model_training function and speech_tagging function will be tested seperately.

In order to check your model_training function, we will use 50 sentences from train_data to do Part-of-Speech Tagging. To receive full credits, your prediction accuracy should be identical or better than ours.

In order to check your speech_tagging function, we will use 50 sentences from test_data to do Part-of-Speech Tagging. To receive full credits, your prediction accuracy should be identical or better than ours.

What to submit

1.1 Implementation (30 points)

In 1.1, you are given parameters of a HMM and you will implement two procedures.

  1. The Evaluation Problem : Given HMM Model and a sequence of observations, what is the probability that the observations are generated by the model?Two algorithms are usually used for the evaluation problem: forward algorithm or the backward algorithm. Based on the result of forward algorithm and backward algorithm, you will be asked to calculate probability of sequence and posterior probability of state.
  2. The Decoding Problem : Given a model and a sequence of observations, what is the most likely state sequence in the model which produced the observation sequence. For decoding you will be implementing Viterbi algorithm.

HMM Class

In this project, we abstracted Hidden Markov Model as a class. Each Hiddern Markov Model initialized with Pi, A, B, obs_dict and state_dict . HMM class has 6 inner functions: forward function, backward function, sequence_prob function, posterior_prob function, likelihood_prob and viterbi function.

###You can add your ownfunction or variables in HMM class, but you shouldn 't change current existing api. ###class HMM:    def __init__(self, pi, A, B, obs_dict, state_dict):    -pi: (1 * num_state) A numpy array of initial probailities.pi[i] = P(X_1 = s_i) -    A: (num_state * num_state) A numpy array of transition probailities.A[i, j] = P(X_t = s_j | X_t - 1 = s_i) -    B: (num_state * num_obs_symbol) A numpy array of observation probabilities.B[i, o] = P(Z_t = z_o | X_t = s_i) -    obs_dict: A dictionary mapping each observation symbol to their index in B -    state_dict: A dictionary mapping each state to their index in pi and A    #TODO:    def forward(self, Osequence):    #TODO:    def backward(self, Osequence):    #TODO:    def sequence_prob(self, Osequence):    #TODO:    def posterior_prob(self, Osequence):    #TODO:    def likelihood_prob(self, Osequence):    #TODO:    def viterbi(self, Osequence):

1.1.1 Evaluation problem

(a) Forward algorithm and backward algorithm (10 points)

Here lambda means the model. Please finish the implementation of forward() function and backward() function in hmm.py:

  • alpha[i, t] = P(X_t = s_i, Z_{1:t} | lambda).
def forward(self, Osequence):    """Inputs:    -self.pi: (1 * num_state) A numpy array of initial probailities.pi[i] = P(X_1 = s_i) -    self.A: (num_state * num_state) A numpy array of transition probailities.A[i, j] = P(X_t = s_j | X_t - 1 = s_i) -    self.B: (num_state * num_obs_symbol) A numpy array of observation probabilities.B[i, o] = P(Z_t = z_o | X_t = s_i) -    Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -alpha: (num_state * L) A numpy array alpha[i, t] = P(X_t = s_i, Z_1: Z_t | λ)"""
  • beta[i, t] = P(Z_{t+1: T}|X_t = s_i, lambda).
def backward(self, Osequence):    """Inputs:    -self.pi: (1 * num_state) A numpy array of initial probailities.pi[i] = P(X_1 = s_i) -    self.A: (num_state * num_state) A numpy array of transition probailities.A[i, j] = P(X_t = s_j | X_t - 1 = s_i) -    self.B: (num_state * num_obs_symbol) A numpy array of observation probabilities.B[i, o] = P(Z_t = z_o | X_t = s_i) -    Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -beta: (num_state * L) A numpy array beta[i, t] = P(Z_t + 1: Z_T | X_t = s_i, λ)"""

(b) Sequence probability

Based on your forward and backward function, you will calculate the sequence probability. (You can call forward function or backward function inside of sequence_prob function)P(Z_1, . . . , Z_T = O|lambda) = sum_{i=1}^{N}P(X_t = s_i, Z_{1: T} | lambda) = sum_{i=1}^{N}alpha[i, T]

def sequence_prob(self, Osequence):    """Inputs:    -Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -prob: A float number of P(Z_1: Z_T | λ)"""

( c) Posterior probability

The forward variable alpha[i, t] and backward variable beta[i, t] are used to calculate the posterior probability of a specific state. Now for t = 1ldots T and i = 1ldots N, we define posterior probability gamma_t(i) = P(X_t = s_i|O, lambda) the probability of being in state s_i at time t given the observation sequence O and the model lambda.

gamma_t(i) = frac{P(X_t = s_i, O|lambda)}{P(O|lambda)} = frac{P(X_t = s_i, Z_{1:t}|lambda)}{P(O|lambda)}

P(X_t = s_i, Z_{1:t}|lambda) = P(Z_{1:t}|X_t = s_i, lambda) cdot P(Z_{t+1: T}|X_t = s_i, lambda) cdot P(X_t = s_i|lambda) = alpha[i, t] cdot beta[i, t]Thus

gamma_t(i) = frac{alpha[i, t] cdot beta[i, t]}{P(O|lambda)}where

P(O|lambda) = sum_{i=1}^{N}alpha[i, T]Signature:

def posterior_prob(self, Osequence):    """Inputs:    -Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -prob: (num_state * L) A numpy array of P(X_t = i | O, λ)"""

You can use beta_t(i) to find the most likely state at time t which is the state Z_t=s_i for which beta_t(i) is maximum. This algorithm works fine in the case when HMM is ergodic i.e. there is transition from any state to any other state. If applied to an HMM of another architecture, this approach could give a sequence that may not be a legitimate path because some transitions are not permitted. To avoid this problem Viterbi algorithm is the most common decoding algorithms used.

(d) Likelihood of two consecutive states at a given time (5 points)

You are required to calculate the likelihood of transition from state s at time t to state s’ at time t+1. That is, you’re required to calculatexi_{s, s’} (t) = P(X_t=s, X_{t+1}=s’ | Z_{1: T} = z_{1: T})Signature:

def likelihood_prob(self, Osequence):    """Inputs:    -Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -prob: (num_state * (L - 1)) A numpy array of P(X_t = i, X_t + 1 = j | O, λ)"""

(e) Viterbi algorithm (7.5 points)

Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states. We want to compute the most likely state path that corresponds to the observation sequence O based HMM. Namely, k^∗ = (k^∗_1, k^∗_2, ··· , k^∗_T) = textrm{arg}max_k P(s_{k_1}, s_{k_2}, ··· , s_{k_T}|Z_1, Z_2, ··· , Z_T = O, lambda).

Signature:

def viterbi(self, Osequence):    """Inputs:    -Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -path: A List of the most likely hidden state path k * (        return state instead of idx)"""

1.2 Application to Speech Tagging

Part-of-Speech (POS) is a category of words (or, more generally, of lexical items) which have similar grammatical properties. (Example: noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, and sometimes numeral, article, or determiner.)

Part-of-Speech Tagging (POST) is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context.

Here you wiil use HMM to do POST. You will need to calculate the parameters (pi, A, B, obs_dict, state_dict) of HMM first and then apply Viterbi algorithm to do speech-tagging.

Dataset

tags.txt: Universal Part-of-Speech Tagset

Tag Meaning English Examples
ADJ adjective new, good, high, special, big, local
ADP adposition on, of, at, with, by, into, under
ADV adverb really, already, still, early, now
CONJ conjunction and, or, but, if, while, although
DET determiner, article the, a, some, most, every, no, which
NOUN noun year, home, costs, time, Africa
NUM numeral twenty-four, fourth, 1991, 14:24
PRT particle at, on, out, over per, that, up, with
PRON pronoun he, their, her, its, my, I, us
VERB verb is, say, told, given, playing, would
. punctuation marks . , ; !
X other ersatz, esprit, dunno, gr8, univeristy

sentences.txt: Including 57340 sentences which have already been tagged.

Word Tag
b100-48585
She PRON
had VERB
to PRT
move VERB
in ADP
some DET
direction NOUN
.
any DET
direction NOUN
that PRON
would VERB
take VERB
her PRON
away ADV
from ADP
this DET
evil ADJ
place NOUN
. .

Part-of-Speech Tagging

In this part, we collect our dataset and tags with Dataset class. Dataset class includes tags, train_data and test_data. In both dataset include a list of sentences, each sentence is an object of Line class.

You only need to implement model_training function and speech_tagging function. We have provided the accuracy function, which you can use to compare your predict_tagging and true_tagging of a sentence. You can find the definition below.

###You can add your own functions or variables in Dataset class, but you shouldn 't change current functions that exist. ###class Dataset:    def __init__(self, tagfile, datafile, train_test_split = 0.8, seed = 112890):    self.tagsself.train_dataself.test_datadef read_data(self, filename):    def read_tags(self, filename):    class Line:    def __init__(self, line, type):    self.idself.wordsself.tagsself.lengthdef show(self): #TODO:    def model_training(train_data, tags)# TODO:    def Speech_tagging(model, test_data, tags)def accuracy(predict_tagging, true_tagging)

1.2.1 Model training (10 points)

In this part, you will need to calculate the parameters of HMM model based on train_data .

Signature:

def model_training(train_data, tags):    """Inputs:    -train_data: a list of sentences, each sentence is an object of Line class -    tags: a list of POS tagsReturns:    -model: an object of HMM class initialized with paramaters(pi, A, B, obs_dict, state_dict) you calculated based on traning dataset.We will use the object you returned to do speech tagging.    ""    "

1.2.2 Speech_tagging (10 points)

Based on HMM from 2.2.1, do speech tagging for each sentence on test data. Note when you meet a word which is unseen in training dataset. You need to modify the emission matrix and obs_dict of your current model in order to handle this case. You will assume the emission probability from each state to a new unseen word is 10^{-6}(a very low probability).

For example, in hmm_model.json, we use the following paramaters to initialize HMM:    S = ["1", "2"]pi: [0.7, 0.3]A: [    [0.8, 0.2],    [0.4, 0.6]]B = [    [0.5, 0, 0.4, 0.1],    [0.5, 0.1, 0.2, 0.2]]Observations = ["A", "C", "G", "T"]If we find another observation symbol "X" in observation sequence, we will modify parameters of HMM as follows:    S = ["1", "2"]pi: [0.7, 0.3]A: [    [0.8, 0.2],    [0.4, 0.6]]B = [    [0.5, 0, 0.4, 0.1, 1e-6],    [0.5, 0.1, 0.2, 0.2, 1e-6]]Observations = ["A", "C", "G", "T", "X"]

You do not get access to test_data on model_training function, you need to implement the logic to tag a new sequence in speech_tagging function.

Signature:

def speech_tagging(test_data, model):    """Inputs:    -test_data: (1 * num_sentence) a list of sentences, each sentence is an object of Line class -    model: an object of HMM classReturns:    -tagging: (num_sentence * num_tagging) a 2 D list of output taggingfor each sentences on test_data    """

1.2.3 Suggestion(0 points)

This part won’t be graded. In order to have a better understanding of HMM. Come up with one sentence by yourself and tagging it manually. Then run your forward function, backward function, seq_prob function, posterior_prob function and viterbi function on the model from 2.2.1. Print the result of each function, see if you can explain your result.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] CSCI567 Project 4 Part1-Hidden Markov Models
30 $