[Solved] Assignment 7: Hash Tables and Markov Text Generation

$25

File Name: Assignment_7:_Hash_Tables_and_Markov_Text_Generation.zip
File Size: 489.84 KB

SKU: [Solved] Assignment 7: Hash Tables and Markov Text Generation Category: Tag:
5/5 - (1 vote)

The Infinite Monkey Theorem (IFT) says that if a monkey hits keys at random on a typewriter it will almost surely, given an infinite amount of time, produce a chosen text (like the Declaration of Independence, Hamlet, or a script for Star Trek the Next Generation). The probability of this actually happening is, ofcourse, very small but the IFT claims that it is still possible. Some people intent on testing this hypothesis have set up virtual monkey sweatshops where poor cyber-monkeys type relentlessly in less than humane conditions. After billions and billions of simulated years, one monkey in the group was able to type out a sequence of 19 letters that can be found in Shakespeares The Two Gentlemen of Verona. (See the April 9 2007 edition of The New Yorker if youre interested.)The IFT might lead to some interesting philosophical discussions, but from a practical point of view who cares? A more promising line of inquiry might go like this: Could a monkey with prior knowledge of Claude Shannons information theory and Markov chains produce a reasonable imitation of a known text in a reasonable length of time? Now that might actually be interesting. Its so interesting, in fact, that we should give it a namethe Markov Monkey Question (MMQ)and we should spend some timethinking about it. It turns out that the answer to the MMQ is maybe. Heres an excerpt from a Markov Monkey version of the script for the 1967 Planet of the Apes.PLANET OF THE APESScreenplay by Michael WilsonBased on Novel By Pierre BoulleDISSOLVE TO: 138 EXT. GROVE OF FRUIT TREES ESTABLISHING SHOT DAYZira run back to the front of Taylor.The President, I believe the prosecutors charge of this man. ZIRA Well, whoever owned them was in pretty bad shape.He picks up two of the strain.You got what you wanted, kid. How does it taste? Silence. Taylor and cuffs him.Over this we HEAR from a distance is a crude horse-drawn wagon is silhouetted-against the trunks and branches of great trees and bushes on the horses rump. Taylor lifts his right arm toward off the blow, and the room and lands at the feet of Cornelius and Lucius are sorting out equipment falls to his knees, burieshis head silently at the Ranch).DISSOLVE TO: 197 INT. CAGES CLOSE SHOT FEATURING LANDON FROM TAYLORS VOICE (o.s.)Ive got a fine veternary surgeons under my direction?1COMP 2210ZIRA Taylor!ZIRA There is a small lake, looking like a politician.TAYLOR Dodge takes a pen and notebook from the half-open doorof a guard room. Taylor bursts suddenly confronted by his original pursuer (the dismounted cop coming up with a cigar butt and places it in the drawer beside them.TAYLOR Whats the best there is a. loud RAP at the doll was found beside the building. Zira waits at the third table.TAYLOR Good question. Is he a man?CORNELIUS (impatiently. DODGE Blessed are the vegetation.These SHOTS are INTERCUT with: 94 WHAT THE ASTRONAUTSThey examine the remnants of the cage. ZIRA (plunging on) Their speech organs are adequate.The flaw lies not in anatomy but in the back of his left sleeve.TAYLOR (taking off his shirt. 80 DODGE AND LANDONYou dont sound happy in your work.GALEN (defensively) Gorilla hunter stands over a dead man, one foBesides a few spelling errors and some rather odd things that make you wonder about the author,this passage is surprisingly human-like. It turns out that Markov monkeys can apply a basic idea first discussed by Claude Shannon to produce pretty good imitations of real text. (And by the way, this basic idea has very important applications in image processing and the automatic generation of realistic terrainsand surfaces in computer graphics.) ApproachSo, heres the basic idea: Imagine taking a book (say, Tom Sawyer) and determining the probability with which each character occurs. You would probably find that spaces are the most common, that the character e is fairly common, and that the character q is rather uncommon. After completing this level0 analysis, you would be able to produce random Tom Sawyer text based on character probabilities. It wouldnt have much in common with the real thing, but at least the characters would tend to occur in the proper proportion. In fact, heres an example of what you might produce:Level 0rla bsht eS ststofo hhfosdsdewno oe wee h .mr ae irii ela iad o r te u t mnyto onmalysnce, ifu en c fDwn oee iteoNow imagine doing a slightly more sophisticated level 1 analysis by determining the probability with which each character follows every other character. You would probably discover that h follows t more frequently than x does, and you would probably discover that a space follows . more frequently than ,does. You could now produce some randomly generated Tom Sawyer text by picking a character to begin with and then always choosing the next character based on the previous one and the probabilities revealed by the analysis. Heres an example:Level 1Shand tucthiney m? le ollds mind Theybooure He, he s whit Pereg lenigabo Jodind alllld ashanthe ainofevids tre lin-p asto oun theanthadomoereNow imagine doing a level k analysis by determining the probability with which each character followsPage 2 of 6COMP 2210every possible sequence of characters of length k. A level 5 analysis of Tom Sawyer for example, would reveal that r follows Sawye more frequently than any other character. After a level k analysis, you would be able to produce random Tom Sawyer by always choosing the next character based on the previous k characters (the seed) and the probabilities revealed by the analysis.At only a moderate level of analysis (say, levels 5-7), the randomly generated text begins to take on many of the characteristics of the source text. It probably wont make complete sense, but youll be able to tell that it was derived from Tom Sawyer as opposed to, say, The Sound and the Fury.Here are some more examples of text that is generated from increasing levels of analysis of Tom Sawyer.(These levels of analysis are called order k Markov models.)Level 2Yess been. for gothin, Tome oso; ing, in to weliss of ante cle armit. Papper a comeasione, and smomenty, fropeck hinticer, sid, a was Tom, be suck tied. He sis tred a youck to themenLevel 4en themself, Mr. Welshman, but him awoke, the balmy shore. Ill give him that he couple overy because in the slated snufflindeed structures kind was rath. She said that the wound the door a fever eyes that WITH him.Level 6people had eaten, leaving. Come didnt stand it better judgment; His hands and bury it again, tramped herself! Shed never would be. He found her spite of anything the one was a prime feature sunset, and hit upon that of the forever.Level 8look-a-here I told you before, Joe. Ive heard a pin drop. The stillness was complete, how- ever, this is awful crime, beyond the village was sufficient. He would be a good enough to get that night, Tom and Becky.Level 10you understanding that they dont come around in the cave should get the word beauteous was over-fondled, and that together and decided that he might as we used to do its nobby fun. Ill learn you.Once you have performed a given level of analysis, heres the basic process of generating text from an order k Markov model of some sample text.1. Pick k consecutive characters that appear in the sample text and use them as the initial seed value.2. Append the seed to the output text being generated.3. Repeat the following steps until the output text is sufficiently long.(a) Select a character c that appears in the sample text based on the probability of that character following the current seed.(b) Append this character to the output text.(c) Update the seed by removing its first character and adding the character just chosen (c) as its last character.

COMP 2210If this process encounters a situation in which there are no characters to choose from (which can happen if the only occurrence of the current seed is at the exact end of the source), simply pick a new seed atrandom and continue. For example, suppose that k = 2 and the sample text is:the three pirates charted that course the other dayHere is how the first three characters might be chosen: A two-character seed is chosen at random to become the initial seed. Lets suppose that th ischosen. So, seed = th and output = th. The first character must be chosen based on the probability that it follows the seed (currently th)in the source. The source contains five occurrences of th. Three times it is followed by e, onceit is followed by r, and once it is followed by a. Thus, the next character must be chosen so thatthere is a 3/5 chance that an e will be chosen, a 1/5 chance that an r will be chosen, and a 1/5chance that an a will be chosen. Lets suppose that we choose an e this time. So, seed = he andoutput = the. The next character must be chosen based on the probability that it follows the seed (currently he)in the source. The source contains three occurrences of he. Twice it is followed by a space andonce it is followed by r. Thus, the next character must be chosen so that there is a 2/3 chance thata space will be chosen and a 1/3 chance that an r will be chosen. Lets suppose that we choose anr this time. So, seed = er and output = ther. The next character must be chosen based on the probability that it follows the seed (currently er)in the source. The source contains only one occurrence of er, and it is followed by a space. Thus,the next character must be a space. So, seed = r_ and output = ther_, where _ represents a blankspace.As another example, suppose that k = 2 and the sample text is:agggcagcgggcgHere are four different output text strings of length 10 that could have been the result of the processdescribed above, using the first two characters (ag) as the initial seed.agcggcagcgaggcaggcggagggcaggcgagcggcggcaImplementation DetailsYou are provided with two Java files that you must use to develop your solution: MarkovMonkey.javaand LanguageModeler.java.public class MarkovMonkey {public static void main(String[] args) {}}Page 4 of 6COMP 2210 Fall 2014public class LanguageModeler {// create an order K model for textpublic LanguageModeler(int K, File text)// create an order K model for textpublic LanguageModeler(int K, String text)// return the first K characters of the sample textpublic String firstSeed()// return K consecutive characters from the sample text,// selected uniformly at randompublic String randomSeed()// return a character that follows seed in the sample text,// selected according to the probability distribution of all// characters that follow seed in the sample textpublic char nextChar(String seed)}The main method of MarkovMonkey must process the following four command line arguments (in theargs array): A non-negative integer k A non-negative integer length. The name of an input file source that contains more than k characters. The name of an output file result.Your program must validate the command line arguments by making sure that k and length are nonnegativeand that source contains more than k characters and can be opened for reading. If result does notexist, your program creates it. If result already exists, your program overwrites it. If any of the commandline arguments are invalid, your program must write an informative error message to System.out andterminate. If there are not enough command line arguments, your program must write an informativeerror message to System.out and terminate.With valid command line arguments, your program must use the methods of the LanguageModelerclass to create an order k Markov model of the sample text, select the initial seed, and make each characterselection. You must implement the LanguageModeler methods according to description of the Markovmodeling process in the section above.A few sample texts have been provided, but Project Gutenberg (http://www.gutenberg.org) maintainsa large collection of public domain literary works that you can use as source texts for fun andpractice.SubmissionSubmit your solution via WebCAT no later than the date and time specified. Turn in MarkovMonkey.javaand LanguageModeler.java. Do not turn in any input or output text files.Page 5 of 6COMP 2210 Fall 2014AcknowledgmentsThis assignment is based on the ideas of many people, but Owen Astrachan in particular.Page 6 of 6

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] Assignment 7: Hash Tables and Markov Text Generation
$25