- CSE 447 and CSE 517 Students: Implementation Problem
This tarball includes two files:
- txt contains single sentences, one per line (i.e., newline characters delimit sentences), each with some of its characters masked; characters are delimited by whitespace, masked characters are replaced with <mask>, spaces are denoted <s>, and the end-of-sequence symbol is <eos>
- txt contains a bigram language model over characters; the format on each line is a space b tab p(b | a) with special space and end-of-sequence characters denoted as above.
Your job is to decode the sentences, replacing each masked character with a character from the bigram models vocabulary. For input x, your output y should be the sequence that maximizes p(y) under the bigram language model, subject to the constraint that, for every position i where x is not masked, yi = xi. (In plain English, we want you to find the most likely unmasked sequence consistent with the input, under the model we gave you.) Hint: you should adapt the Viterbi algorithm to solve this problem exactly in polynomial time and space.
Your output file should be in the same format as the input, but with every mask token replaced as described above. Your writeup should explain how you generate a score for each possible choice and how you recover the best sequence. You may use existing libraries and tools for this problem (be sure to acknowledge them in your solution), but we do not expect they will be very helpful.
- CSE 447 and CSE 517 Students: Based on Eisenstein 9.3 (p. 213)
Construct (i.e., draw on the page; a TA recommends this tool) a finite-state automaton in the style of figures
9.2 and 9.3 (pp. 1878), which handles the following examples:
- nation/N, national/ADJ, nationalize/V, nationalizer/N
- America/N, American/ADJ, Americanize/V, Americanizer/N
Be sure that your FSA does not accept any further derivations, such as *nationalizeral and *Americanizern.
- CSE 447 and CSE 517 Students: Viterbi for Second-Order Models
At the end of the CRF lecture (slide 106), we proposed a second-order sequence model that defines scores of triplets of labels, s(x,i,yi,yi+1,yi+2). The problem to be solved when we decode with this model is:
`1
y= argmaxXs(x,i,yi,yi+1,yi+2) (1)
yL` i=0
Note that there are ` terms in the sum. y0 (which is a special START label) appears only in the first term (i = 0) and y`+1 (which is a special STOP label) appears only in the last term (i = ` 1). In a conventional trigram-style model, we might also have s(x,1,y1 = START,y0 = START,y1) in the summation, but including such a term doesnt add any expressive power to what is written above.
The Viterbi algorithm can be adapted for this model. The form of the algorithm is nearly identical to what we saw in lecture. First, prefix scores are computed left to right, each using a formula with a local max, and along with each prefix score we also store a backpointer that stores the argmax. Then the backpointers are followed right to left to select a label for each word in turn. Here is most of the algorithm:
Input: scores s(x,i,y00,y0,y),i h0,,`i,y00 L,y0 L,y L
Output: y
- Base case: ♥2(y0,y) = s(x,0,START,y0,y)
(Note that the subscript in ♥2 refers to the last position in the triple, here, the word labeled y, while the 0 argument to s refers to the first position in the triple.)
- For i h3,,` 1i:
- Solve for ♥i(,) and bi(,).
- Special case for the end (note a small correction in this line relative to an earlier version of the assignment, which included an equality on the left to ♥`1(y0,y) that should not have been there):
♥`(y0,y) = s(x,` 1,y0,y,STOP)+maxs(x,` 2,y00,y0,y)+ ♥`1(y00,y0) L
{z } b`(y0,y) is the argmax
The above looks a little different from what you saw in lecture, where we calculated ♥`+1(STOP). Writing it as weve done here is just a slightly different choice that gives you a little more of a hint. To see why, take a close look at slide 70.
- (y`1,y`) argmax♥`(y0,y)
y0,yL2
- For i h` 2,,1i:
- yi bi+2(yi+1,yi+2)
Deliverables:
- Give the recurrence for ♥i(y0,y) and for bi(y0,y) (the boxed bit of the algorithm above). Weve given you the base case and the special case for the end of the sequence.
- Analyze the runtime and space requirements of this algorithm as functions of |L| (the number of labels) and ` (the length of the input sequence x and the output sequence y).
4 CSE 517 Students: Introducing PCFGs
In this problem, youll become acquainted with probabilistic context-free grammars (PCFGs), a more powerful family of models that can be used to parse sequences of words into trees. Chapters 9 and 10 in the textbook give extensive motivation for these kinds of models, as well as algorithms for their use. This problem will reveal a connection between PCFGs and HMMs. A PCFG is defined as:
- A finite set of nonterminal symbols N
- A start symbol S N
- A finite alphabet , called terminal symbols, distinct from N
- Production rule set R, each of the form N where
- The lefthand side N is a nonterminal from N
- The righthand side is a sequence of zero or more terminals and/or nonterminals: (N)
Special case: Chomsky normal form constrains to be either a single terminal symbol or two nonterminals
- For each N N, a probability distribution over the rules where N is the lefthand side, p( | N). We will use pr to denote the conditional probability corresponding to rule r
A PCFG defines the probability of a tree t as:
p(t) = Y pcountr t(r) (2)
rR
where countt(r) is the number of times rule r is used in the tree t.
The definition of an HMM was given in class; we spell it out more carefully here.
- A finite set of states L (often referred to as labels)
- A probability distribution over initial states, pstart
- One state denoted 8 is the special stopping state A finite set of observable symbols V
- For each state y L:
- An emission probability distribution over V, pemission
- A transition probability distribution over next states, ptransition
An HMM defines the probability distribution of a state sequence y and an emission sequence x (length n, excluding the stop symbol) as shown on slide 36 in the lecture.
- Demonstrate how to implement any HMM using a PCFG. Do this by showing formally how to construct each element of the PCFG (N, S, , R, and p) from the HMM. Hint: think of a state sequence as a right-branching tree; the nonterminals will correspond to HMM states/labels, and you will probably want to add a new nonterminal to serve as the start state S.
- Transform the grammar you defined above into Chomsky normal form. Every tree derived by the original grammar must have a corresponding tree in the new grammar, with the same probability. Hint: his might be easier if you add a special start word to the beginning of every sequence recognized by the new grammar, and you will likely need to add some new nonterminals, as well. On page 202 of your textbook, the transformation is sketched out for arbitrary context-free grammars that are not probabilistic; to fully complete this exercise, you need to handle the probabilities as well as the rules, but only need to show how to do it for your HMM-derived PCFG. Try to give a more precise definition of the transformation than is given in the textbook.
- Explain how to transform your new PCFGs derivations back into derivations under the original PCFG.
Reviews
There are no reviews yet.