Hidden Markov Models

Name: [Solved] CSCI567 Project 4 Part1-Hidden Markov Models - Assignment Chef
SKU: [Solved] CSCI567 Project 4 Part1-Hidden Markov Models
Price: 30 USD
Availability: InStock

In this programming assignment, we will implement Hidden Markov Models (HMM) and apply HMM to Part-of-Speech Tagging problem.

You need to first implement 6 important functions that help us accomplish various steps involved in learning HMM. Then you will be calculating HMM parameters from the data and use it to solve Part-of-Speech Tagging problem.

After finishing the implementation, you can use hmm_test_script.py to test the correctness of your functions.

There are 5 sets of grading data. Each grading data includes paramaters (pi, A, B, obs_dict, state_dict, Osequence) , which we will use to initialize HMM class and test your functions. To receive full credits, your output of function 1-5 should within an error of $10^{-8}$ , your output of viterbi function should be identical with ours.

For the definitions of the parameters pi, A, B etc., please refer to the code documentation.

1.2 Application to Part-of-Speech Tagging

model_training – 10 = 10x(your_correct_pred_cnt/our_correct_pred_cnt)
speech_tagging – 10 = 10x(your_correct_pred_cnt/our_correct_pred_cnt)

We will use the dataset given to you for grading this part (with a different random seed). We will train your model and our model on same train_data. model_training function and speech_tagging function will be tested seperately.

In order to check your model_training function, we will use 50 sentences from train_data to do Part-of-Speech Tagging. To receive full credits, your prediction accuracy should be identical or better than ours.

In order to check your speech_tagging function, we will use 50 sentences from test_data to do Part-of-Speech Tagging. To receive full credits, your prediction accuracy should be identical or better than ours.

What to submit

1.1 Implementation (30 points)

In 1.1, you are given parameters of a HMM and you will implement two procedures.

The Evaluation Problem : Given HMM Model and a sequence of observations, what is the probability that the observations are generated by the model?Two algorithms are usually used for the evaluation problem: forward algorithm or the backward algorithm. Based on the result of forward algorithm and backward algorithm, you will be asked to calculate probability of sequence and posterior probability of state.
The Decoding Problem : Given a model and a sequence of observations, what is the most likely state sequence in the model which produced the observation sequence. For decoding you will be implementing Viterbi algorithm.

HMM Class

In this project, we abstracted Hidden Markov Model as a class. Each Hiddern Markov Model initialized with Pi, A, B, obs_dict and state_dict . HMM class has 6 inner functions: forward function, backward function, sequence_prob function, posterior_prob function, likelihood_prob and viterbi function.

###You can add your ownfunction or variables in HMM class, but you shouldn 't change current existing api. ###class HMM:    def __init__(self, pi, A, B, obs_dict, state_dict):    -pi: (1 * num_state) A numpy array of initial probailities.pi[i] = P(X_1 = s_i) -    A: (num_state * num_state) A numpy array of transition probailities.A[i, j] = P(X_t = s_j | X_t - 1 = s_i) -    B: (num_state * num_obs_symbol) A numpy array of observation probabilities.B[i, o] = P(Z_t = z_o | X_t = s_i) -    obs_dict: A dictionary mapping each observation symbol to their index in B -    state_dict: A dictionary mapping each state to their index in pi and A    #TODO:    def forward(self, Osequence):    #TODO:    def backward(self, Osequence):    #TODO:    def sequence_prob(self, Osequence):    #TODO:    def posterior_prob(self, Osequence):    #TODO:    def likelihood_prob(self, Osequence):    #TODO:    def viterbi(self, Osequence):

1.1.1 Evaluation problem

(a) Forward algorithm and backward algorithm (10 points)

Here $λ$ means the model. Please finish the implementation of forward() function and backward() function in hmm.py:

$P(X_t = s_i, Z_{1:t} | lambda).$

def forward(self, Osequence):    """Inputs:    -self.pi: (1 * num_state) A numpy array of initial probailities.pi[i] = P(X_1 = s_i) -    self.A: (num_state * num_state) A numpy array of transition probailities.A[i, j] = P(X_t = s_j | X_t - 1 = s_i) -    self.B: (num_state * num_obs_symbol) A numpy array of observation probabilities.B[i, o] = P(Z_t = z_o | X_t = s_i) -    Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -alpha: (num_state * L) A numpy array alpha[i, t] = P(X_t = s_i, Z_1: Z_t | λ)"""

$P(Z_{t+1: T}|X_t = s_i, lambda).$

def backward(self, Osequence):    """Inputs:    -self.pi: (1 * num_state) A numpy array of initial probailities.pi[i] = P(X_1 = s_i) -    self.A: (num_state * num_state) A numpy array of transition probailities.A[i, j] = P(X_t = s_j | X_t - 1 = s_i) -    self.B: (num_state * num_obs_symbol) A numpy array of observation probabilities.B[i, o] = P(Z_t = z_o | X_t = s_i) -    Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -beta: (num_state * L) A numpy array beta[i, t] = P(Z_t + 1: Z_T | X_t = s_i, λ)"""

(b) Sequence probability

Based on your forward and backward function, you will calculate the sequence probability. (You can call forward function or backward function inside of sequence_prob function) $P(Z_1, . . . , Z_T = O|lambda) = sum_{i=1}^{N}P(X_t = s_i, Z_{1: T} | lambda) = sum_{i=1}^{N}alpha[i, T]$

def sequence_prob(self, Osequence):    """Inputs:    -Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -prob: A float number of P(Z_1: Z_T | λ)"""

( c) Posterior probability

The forward variable $α [i, t]$ and backward variable $β [i, t]$ are used to calculate the posterior probability of a specific state. Now for $t = 1 \dots T$ and $i = 1 \dots N$ , we define posterior probability $gamma_t(i) = P(X_t = s_i|O, lambda)$ the probability of being in state $s_i$ at time t given the observation sequence O and the model $λ$ .

$gamma_t(i) = frac{P(X_t = s_i, O|lambda)}{P(O|lambda)} = frac{P(X_t = s_i, Z_{1:t}|lambda)}{P(O|lambda)}$

$P(X_t = s_i, Z_{1:t}|lambda) = P(Z_{1:t}|X_t = s_i, lambda) cdot P(Z_{t+1: T}|X_t = s_i, lambda) cdot P(X_t = s_i|lambda) = alpha[i, t] cdot beta[i, t]$ Thus

$gamma_t(i) = frac{alpha[i, t] cdot beta[i, t]}{P(O|lambda)}$ where

$sum_{i=1}^{N}alpha[i, T]$ Signature:

def posterior_prob(self, Osequence):    """Inputs:    -Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -prob: (num_state * L) A numpy array of P(X_t = i | O, λ)"""

You can use $beta_t(i)$ to find the most likely state at time t which is the state $Z_t=s_i$ for which $beta_t(i)$ is maximum. This algorithm works fine in the case when HMM is ergodic i.e. there is transition from any state to any other state. If applied to an HMM of another architecture, this approach could give a sequence that may not be a legitimate path because some transitions are not permitted. To avoid this problem Viterbi algorithm is the most common decoding algorithms used.

(d) Likelihood of two consecutive states at a given time (5 points)

You are required to calculate the likelihood of transition from state $s$ at time $t$ to state $s^{'}$ at time $t + 1$ . That is, you’re required to calculate $xi_{s, s’} (t) = P(X_t=s, X_{t+1}=s’ | Z_{1: T} = z_{1: T})$ Signature:

def likelihood_prob(self, Osequence):    """Inputs:    -Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -prob: (num_state * (L - 1)) A numpy array of P(X_t = i, X_t + 1 = j | O, λ)"""

(e) Viterbi algorithm (7.5 points)

Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states. We want to compute the most likely state path that corresponds to the observation sequence O based HMM. Namely, $k^∗ = (k^∗_1, k^∗_2, ··· , k^∗_T) = textrm{arg}max_k P(s_{k_1}, s_{k_2}, ··· , s_{k_T}|Z_1, Z_2, ··· , Z_T = O, lambda)$ .

Signature:

def viterbi(self, Osequence):    """Inputs:    -Osequence: (1 * L) A numpy array of observation sequence with length LReturns:    -path: A List of the most likely hidden state path k * (        return state instead of idx)"""

1.2 Application to Speech Tagging

Part-of-Speech (POS) is a category of words (or, more generally, of lexical items) which have similar grammatical properties. (Example: noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, and sometimes numeral, article, or determiner.)

Part-of-Speech Tagging (POST) is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context.

Here you wiil use HMM to do POST. You will need to calculate the parameters (pi, A, B, obs_dict, state_dict) of HMM first and then apply Viterbi algorithm to do speech-tagging.

Dataset

tags.txt: Universal Part-of-Speech Tagset

Tag	Meaning	English Examples
ADJ	adjective	new, good, high, special, big, local
ADP	adposition	on, of, at, with, by, into, under
ADV	adverb	really, already, still, early, now
CONJ	conjunction	and, or, but, if, while, although
DET	determiner, article	the, a, some, most, every, no, which
NOUN	noun	year, home, costs, time, Africa
NUM	numeral	twenty-four, fourth, 1991, 14:24
PRT	particle	at, on, out, over per, that, up, with
PRON	pronoun	he, their, her, its, my, I, us
VERB	verb	is, say, told, given, playing, would
.	punctuation marks	. , ; !
X	other	ersatz, esprit, dunno, gr8, univeristy

sentences.txt: Including 57340 sentences which have already been tagged.

Word	Tag
b100-48585
She	PRON
had	VERB
to	PRT
move	VERB
in	ADP
some	DET
direction	NOUN
–	.
any	DET
direction	NOUN
that	PRON
would	VERB
take	VERB
her	PRON
away	ADV
from	ADP
this	DET
evil	ADJ
place	NOUN
.	.

Part-of-Speech Tagging

In this part, we collect our dataset and tags with Dataset class. Dataset class includes tags, train_data and test_data. In both dataset include a list of sentences, each sentence is an object of Line class.

You only need to implement model_training function and speech_tagging function. We have provided the accuracy function, which you can use to compare your predict_tagging and true_tagging of a sentence. You can find the definition below.

###You can add your own functions or variables in Dataset class, but you shouldn 't change current functions that exist. ###class Dataset:    def __init__(self, tagfile, datafile, train_test_split = 0.8, seed = 112890):    self.tagsself.train_dataself.test_datadef read_data(self, filename):    def read_tags(self, filename):    class Line:    def __init__(self, line, type):    self.idself.wordsself.tagsself.lengthdef show(self): #TODO:    def model_training(train_data, tags)# TODO:    def Speech_tagging(model, test_data, tags)def accuracy(predict_tagging, true_tagging)

1.2.1 Model training (10 points)

In this part, you will need to calculate the parameters of HMM model based on train_data .

Signature:

def model_training(train_data, tags):    """Inputs:    -train_data: a list of sentences, each sentence is an object of Line class -    tags: a list of POS tagsReturns:    -model: an object of HMM class initialized with paramaters(pi, A, B, obs_dict, state_dict) you calculated based on traning dataset.We will use the object you returned to do speech tagging.    ""    "

1.2.2 Speech_tagging (10 points)

Based on HMM from 2.2.1, do speech tagging for each sentence on test data. Note when you meet a word which is unseen in training dataset. You need to modify the emission matrix and obs_dict of your current model in order to handle this case. You will assume the emission probability from each state to a new unseen word is $10^{-6}$ (a very low probability).

For example, in hmm_model.json, we use the following paramaters to initialize HMM:    S = ["1", "2"]pi: [0.7, 0.3]A: [    [0.8, 0.2],    [0.4, 0.6]]B = [    [0.5, 0, 0.4, 0.1],    [0.5, 0.1, 0.2, 0.2]]Observations = ["A", "C", "G", "T"]If we find another observation symbol "X" in observation sequence, we will modify parameters of HMM as follows:    S = ["1", "2"]pi: [0.7, 0.3]A: [    [0.8, 0.2],    [0.4, 0.6]]B = [    [0.5, 0, 0.4, 0.1, 1e-6],    [0.5, 0.1, 0.2, 0.2, 1e-6]]Observations = ["A", "C", "G", "T", "X"]

You do not get access to test_data on model_training function, you need to implement the logic to tag a new sequence in speech_tagging function.

Signature:

def speech_tagging(test_data, model):    """Inputs:    -test_data: (1 * num_sentence) a list of sentences, each sentence is an object of Line class -    model: an object of HMM classReturns:    -tagging: (num_sentence * num_tagging) a 2 D list of output taggingfor each sentences on test_data    """

1.2.3 Suggestion(0 points)

This part won’t be graded. In order to have a better understanding of HMM. Come up with one sentence by yourself and tagging it manually. Then run your forward function, backward function, seq_prob function, posterior_prob function and viterbi function on the model from 2.2.1. Print the result of each function, see if you can explain your result.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Whatsapp Us

[Solved] CSCI567 Project 4 Part1-Hidden Markov Models

Or Upload Your Assignment Here:

Hidden Markov Models

What to submit

1.1 Implementation (30 points)

HMM Class

1.1.1 Evaluation problem

(a) Forward algorithm and backward algorithm (10 points)

(b) Sequence probability

( c) Posterior probability

(d) Likelihood of two consecutive states at a given time (5 points)

(e) Viterbi algorithm (7.5 points)

1.2 Application to Speech Tagging

Dataset

Part-of-Speech Tagging

1.2.1 Model training (10 points)

1.2.2 Speech_tagging (10 points)

1.2.3 Suggestion(0 points)

Reviews

Whatsapp Us

[Solved] CSCI567 Project 4 Part1-Hidden Markov Models

Or Upload Your Assignment Here:

Hidden Markov Models

What to submit

1.1 Implementation (30 points)

HMM Class

1.1.1 Evaluation problem

(a) Forward algorithm and backward algorithm (10 points)

(b) Sequence probability

( c) Posterior probability

(d) Likelihood of two consecutive states at a given time (5 points)

(e) Viterbi algorithm (7.5 points)

1.2 Application to Speech Tagging

Dataset

Part-of-Speech Tagging

1.2.1 Model training (10 points)

1.2.2 Speech_tagging (10 points)

1.2.3 Suggestion(0 points)

Reviews

Related products

[Solved] CSCI567 Homework1-K-nearest neighbor (KNN) for binary classification

[Solved] CSCI567 Homework 1-Algorithmic component

[Solved] CSCI567 Problem 2-Decision Tree

[Solved] CSCI567 Homework2-Classification

[Solved] CSCI567 Assignment 1-Stagewise boosting

[Solved] CSCI567 Homework 2-Neural networks