Hidden Markov Models
In this programming assignment, we will implement Hidden Markov Models (HMM) and apply HMM to Part-of-Speech Tagging problem.
You need to first implement 6 important functions that help us accomplish various steps involved in learning HMM. Then you will be calculating HMM parameters from the data and use it to solve Part-of-Speech Tagging problem.
After finishing the implementation, you can use hmm_test_script.py to test the correctness of your functions.
There are 5 sets of grading data. Each grading data includes paramaters (pi, A, B, obs_dict, state_dict, Osequence)
, which we will use to initialize HMM class and test your functions. To receive full credits, your output of function 1-5 should within an error of 10^{-8}, your output of viterbi function should be identical with ours.
For the definitions of the parameters pi
, A
, B
etc., please refer to the code documentation.
1.2 Application to Part-of-Speech Tagging
model_training
10 = 10x(your_correct_pred_cnt/our_correct_pred_cnt)speech_tagging
10 = 10x(your_correct_pred_cnt/our_correct_pred_cnt)
We will use the dataset given to you for grading this part (with a different random seed). We will train your model and our model on same train_data. model_training
function and speech_tagging
function will be tested seperately.
In order to check your model_training function, we will use 50 sentences from train_data
to do Part-of-Speech Tagging. To receive full credits, your prediction accuracy should be identical or better than ours.
In order to check your speech_tagging
function, we will use 50 sentences from test_data
to do Part-of-Speech Tagging. To receive full credits, your prediction accuracy should be identical or better than ours.
What to submit
1.1 Implementation (30 points)
In 1.1, you are given parameters of a HMM and you will implement two procedures.
- The Evaluation Problem : Given HMM Model and a sequence of observations, what is the probability that the observations are generated by the model?Two algorithms are usually used for the evaluation problem: forward algorithm or the backward algorithm. Based on the result of forward algorithm and backward algorithm, you will be asked to calculate probability of sequence and posterior probability of state.
- The Decoding Problem : Given a model and a sequence of observations, what is the most likely state sequence in the model which produced the observation sequence. For decoding you will be implementing Viterbi algorithm.
HMM Class
In this project, we abstracted Hidden Markov Model as a class. Each Hiddern Markov Model initialized with Pi, A, B, obs_dict
and state_dict
. HMM class has 6 inner functions: forward
function, backward
function, sequence_prob
function, posterior_prob
function, likelihood_prob
and viterbi
function.
###You can add your ownfunction or variables in HMM class, but you shouldn 't change current existing api. ###class HMM: def __init__(self, pi, A, B, obs_dict, state_dict): -pi: (1 * num_state) A numpy array of initial probailities.pi[i] = P(X_1 = s_i) - A: (num_state * num_state) A numpy array of transition probailities.A[i, j] = P(X_t = s_j | X_t - 1 = s_i) - B: (num_state * num_obs_symbol) A numpy array of observation probabilities.B[i, o] = P(Z_t = z_o | X_t = s_i) - obs_dict: A dictionary mapping each observation symbol to their index in B - state_dict: A dictionary mapping each state to their index in pi and A #TODO: def forward(self, Osequence): #TODO: def backward(self, Osequence): #TODO: def sequence_prob(self, Osequence): #TODO: def posterior_prob(self, Osequence): #TODO: def likelihood_prob(self, Osequence): #TODO: def viterbi(self, Osequence):
1.1.1 Evaluation problem
(a) Forward algorithm and backward algorithm (10 points)
Here lambda means the model. Please finish the implementation of forward()
function and backward()
function in hmm.py:
- alpha[i, t] = P(X_t = s_i, Z_{1:t} | lambda).
def forward(self, Osequence): """Inputs: -self.pi: (1 * num_state) A numpy array of initial probailities.pi[i] = P(X_1 = s_i) - self.A: (num_state * num_state) A numpy array of transition probailities.A[i, j] = P(X_t = s_j | X_t - 1 = s_i) - self.B: (num_state * num_obs_symbol) A numpy array of observation probabilities.B[i, o] = P(Z_t = z_o | X_t = s_i) - Osequence: (1 * L) A numpy array of observation sequence with length LReturns: -alpha: (num_state * L) A numpy array alpha[i, t] = P(X_t = s_i, Z_1: Z_t | )"""
- beta[i, t] = P(Z_{t+1: T}|X_t = s_i, lambda).
def backward(self, Osequence): """Inputs: -self.pi: (1 * num_state) A numpy array of initial probailities.pi[i] = P(X_1 = s_i) - self.A: (num_state * num_state) A numpy array of transition probailities.A[i, j] = P(X_t = s_j | X_t - 1 = s_i) - self.B: (num_state * num_obs_symbol) A numpy array of observation probabilities.B[i, o] = P(Z_t = z_o | X_t = s_i) - Osequence: (1 * L) A numpy array of observation sequence with length LReturns: -beta: (num_state * L) A numpy array beta[i, t] = P(Z_t + 1: Z_T | X_t = s_i, )"""
(b) Sequence probability
Based on your forward and backward function, you will calculate the sequence probability. (You can call forward function or backward function inside of sequence_prob
function)P(Z_1, . . . , Z_T = O|lambda) = sum_{i=1}^{N}P(X_t = s_i, Z_{1: T} | lambda) = sum_{i=1}^{N}alpha[i, T]
def sequence_prob(self, Osequence): """Inputs: -Osequence: (1 * L) A numpy array of observation sequence with length LReturns: -prob: A float number of P(Z_1: Z_T | )"""
( c) Posterior probability
The forward variable alpha[i, t] and backward variable beta[i, t] are used to calculate the posterior probability of a specific state. Now for t = 1ldots T and i = 1ldots N, we define posterior probability gamma_t(i) = P(X_t = s_i|O, lambda) the probability of being in state s_i at time t given the observation sequence O and the model lambda.
gamma_t(i) = frac{P(X_t = s_i, O|lambda)}{P(O|lambda)} = frac{P(X_t = s_i, Z_{1:t}|lambda)}{P(O|lambda)}
P(X_t = s_i, Z_{1:t}|lambda) = P(Z_{1:t}|X_t = s_i, lambda) cdot P(Z_{t+1: T}|X_t = s_i, lambda) cdot P(X_t = s_i|lambda) = alpha[i, t] cdot beta[i, t]Thus
gamma_t(i) = frac{alpha[i, t] cdot beta[i, t]}{P(O|lambda)}where
P(O|lambda) = sum_{i=1}^{N}alpha[i, T]Signature:
def posterior_prob(self, Osequence): """Inputs: -Osequence: (1 * L) A numpy array of observation sequence with length LReturns: -prob: (num_state * L) A numpy array of P(X_t = i | O, )"""
You can use beta_t(i) to find the most likely state at time t which is the state Z_t=s_i for which beta_t(i) is maximum. This algorithm works fine in the case when HMM is ergodic i.e. there is transition from any state to any other state. If applied to an HMM of another architecture, this approach could give a sequence that may not be a legitimate path because some transitions are not permitted. To avoid this problem Viterbi algorithm is the most common decoding algorithms used.
(d) Likelihood of two consecutive states at a given time (5 points)
You are required to calculate the likelihood of transition from state s at time t to state s at time t+1. That is, youre required to calculatexi_{s, s} (t) = P(X_t=s, X_{t+1}=s | Z_{1: T} = z_{1: T})Signature:
def likelihood_prob(self, Osequence): """Inputs: -Osequence: (1 * L) A numpy array of observation sequence with length LReturns: -prob: (num_state * (L - 1)) A numpy array of P(X_t = i, X_t + 1 = j | O, )"""
(e) Viterbi algorithm (7.5 points)
Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states. We want to compute the most likely state path that corresponds to the observation sequence O based HMM. Namely, k^ = (k^_1, k^_2, , k^_T) = textrm{arg}max_k P(s_{k_1}, s_{k_2}, , s_{k_T}|Z_1, Z_2, , Z_T = O, lambda).
Signature:
def viterbi(self, Osequence): """Inputs: -Osequence: (1 * L) A numpy array of observation sequence with length LReturns: -path: A List of the most likely hidden state path k * ( return state instead of idx)"""
1.2 Application to Speech Tagging
Part-of-Speech (POS) is a category of words (or, more generally, of lexical items) which have similar grammatical properties. (Example: noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, and sometimes numeral, article, or determiner.)
Part-of-Speech Tagging (POST) is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context.
Here you wiil use HMM to do POST. You will need to calculate the parameters (pi, A, B, obs_dict, state_dict)
of HMM first and then apply Viterbi algorithm to do speech-tagging.
Dataset
tags.txt: Universal Part-of-Speech Tagset
Tag | Meaning | English Examples |
---|---|---|
ADJ | adjective | new, good, high, special, big, local |
ADP | adposition | on, of, at, with, by, into, under |
ADV | adverb | really, already, still, early, now |
CONJ | conjunction | and, or, but, if, while, although |
DET | determiner, article | the, a, some, most, every, no, which |
NOUN | noun | year, home, costs, time, Africa |
NUM | numeral | twenty-four, fourth, 1991, 14:24 |
PRT | particle | at, on, out, over per, that, up, with |
PRON | pronoun | he, their, her, its, my, I, us |
VERB | verb | is, say, told, given, playing, would |
. | punctuation marks | . , ; ! |
X | other | ersatz, esprit, dunno, gr8, univeristy |
sentences.txt: Including 57340 sentences which have already been tagged.
Word | Tag |
---|---|
b100-48585 | |
She | PRON |
had | VERB |
to | PRT |
move | VERB |
in | ADP |
some | DET |
direction | NOUN |
. | |
any | DET |
direction | NOUN |
that | PRON |
would | VERB |
take | VERB |
her | PRON |
away | ADV |
from | ADP |
this | DET |
evil | ADJ |
place | NOUN |
. | . |
Part-of-Speech Tagging
In this part, we collect our dataset and tags with Dataset class. Dataset class includes tags, train_data and test_data. In both dataset include a list of sentences, each sentence is an object of Line class.
You only need to implement model_training
function and speech_tagging
function. We have provided the accuracy function, which you can use to compare your predict_tagging
and true_tagging
of a sentence. You can find the definition below.
###You can add your own functions or variables in Dataset class, but you shouldn 't change current functions that exist. ###class Dataset: def __init__(self, tagfile, datafile, train_test_split = 0.8, seed = 112890): self.tagsself.train_dataself.test_datadef read_data(self, filename): def read_tags(self, filename): class Line: def __init__(self, line, type): self.idself.wordsself.tagsself.lengthdef show(self): #TODO: def model_training(train_data, tags)# TODO: def Speech_tagging(model, test_data, tags)def accuracy(predict_tagging, true_tagging)
1.2.1 Model training (10 points)
In this part, you will need to calculate the parameters of HMM model based on train_data
.
Signature:
def model_training(train_data, tags): """Inputs: -train_data: a list of sentences, each sentence is an object of Line class - tags: a list of POS tagsReturns: -model: an object of HMM class initialized with paramaters(pi, A, B, obs_dict, state_dict) you calculated based on traning dataset.We will use the object you returned to do speech tagging. "" "
1.2.2 Speech_tagging (10 points)
Based on HMM from 2.2.1, do speech tagging for each sentence on test data. Note when you meet a word which is unseen in training dataset. You need to modify the emission matrix and obs_dict of your current model in order to handle this case. You will assume the emission probability from each state to a new unseen word is 10^{-6}(a very low probability).
For example, in hmm_model.json, we use the following paramaters to initialize HMM: S = ["1", "2"]pi: [0.7, 0.3]A: [ [0.8, 0.2], [0.4, 0.6]]B = [ [0.5, 0, 0.4, 0.1], [0.5, 0.1, 0.2, 0.2]]Observations = ["A", "C", "G", "T"]If we find another observation symbol "X" in observation sequence, we will modify parameters of HMM as follows: S = ["1", "2"]pi: [0.7, 0.3]A: [ [0.8, 0.2], [0.4, 0.6]]B = [ [0.5, 0, 0.4, 0.1, 1e-6], [0.5, 0.1, 0.2, 0.2, 1e-6]]Observations = ["A", "C", "G", "T", "X"]
You do not get access to test_data
on model_training
function, you need to implement the logic to tag a new sequence in speech_tagging
function.
Signature:
def speech_tagging(test_data, model): """Inputs: -test_data: (1 * num_sentence) a list of sentences, each sentence is an object of Line class - model: an object of HMM classReturns: -tagging: (num_sentence * num_tagging) a 2 D list of output taggingfor each sentences on test_data """
1.2.3 Suggestion(0 points)
This part wont be graded. In order to have a better understanding of HMM. Come up with one sentence by yourself and tagging it manually. Then run your forward function, backward function, seq_prob
function, posterior_prob
function and viterbi function on the model from 2.2.1. Print the result of each function, see if you can explain your result.
Reviews
There are no reviews yet.