[Solved] Programming Assignment #4: Tries Solution

$25

File Name: Programming_Assignment_#4:_Tries_Solution.zip
File Size: 386.22 KB

SKU: [Solved] Programming Assignment #4: Tries Solution Category: Tag:
5/5 - (1 vote)

AbstractIn this programming assignment, you will gain experience with an advanced tree data structure that is used to store strings, and which allows for efficient insertion and lookup: a trie! By completing this assignment and reflecting upon the awesomeness of tries, you will fortify your knowledge of algorithms and data structures and solidify your mastery of many C programming topics you have been practicing all semester: dynamic memory allocation, file I/O, processing command line arguments, dealing with structs and pointers to structs, andso much more.In the end, you will have implemented a tremendously useful data structure that has many applications in text processing and corpus linguistics.AttachmentsTrie.h, corpus{01-05}.txt, input{01-05}.txt, output{01-05}.txt, printTrie.txtDeliverablesTrie.c1. Overview: TriesWe have seen in class that the trie data structure can be used to store strings. It provides for efficient string insertion and lookup; insertion into a trie is O(k) (where k is the length of the string being inserted), and searching for a string is an O(k) operation (worst-case). In a trie, a node does not store the string it represents; rather, the edges taken to reach that node from the root indicate the string it represents. Each node contains a flag (or count) variable that indicates whether the string represented by that node has been inserted into the trie (or how many times it has been inserted). For example:Figure 1:This is a trie that codifies the words and, cherry, chocolate, like, and pie. The string pie is represented (or counted) twice. All other strings are counted only once.

rootpie (2)like (1)chocolate (1)cherry (1)and (1)1.1. TrieNode Struct (Trie.h)In this assignment, you will insert words from a corpus (that is, a body of text from an input file) into atrie. The struct you will use for your trie nodes is as follows:typedef struct TrieNode{// number of times this string occurs in the corpusint count;// 26 TrieNode pointers, one for each letter of the alphabetstruct TrieNode *children[26];// the co-occurrence subtrie for this stringstruct TrieNode *subtrie;} TrieNode;You must use this trie node struct, which is specified in Trie.h without any modifications. You should#include the header file from Trie.c like so:#include Trie.hNotice that the trie node, because it only has 26 children, represents strings in a case insensitive way(i.e., apple and AppLE are treated the same in this trie).1.2. Co-occurrence SubtriesWords that appear together in the same sentence are said to co-occur. Consider, for example, thefollowing sentence:I like cherry pie and chocolate pie .In the example sentence, the word chocolate co-occurs with the following terms (with counts inparentheses): and (1), cherry (1), i (1), like (1), pie (2).For the purposes of this assignment, we will never consider a word to co-occur with itself. So, in theexample sentence above, pie co-occurs with: and (1), cherry (1), chocolate (1), i (1), like (1). Noticethat pie itself is not in the list of words co-occurring with pie.If we place these terms (and their associated counts) into their own tries, those tries will look like this:Figure 2:Co-occcurrence subtries for chocolate (left) and pie (right), based onthe sentence, I like cherry pie and chocolate pie.In Figure 2, the trie on the left is what we will call the co-occurrence subtrie for the word chocolate,based on the example sentence above (I like cherry pie and chocolate pie). Its root should be stored inthe subtrie pointer field of the node marked chocolate in the original trie diagram in Figure 1 (seepage 2 above).Similarly, the trie on the right in Figure 2 is the co-occurrence subtrie for the word pie. Its rootshould be stored in the subtrie pointer field of the node marked pie in the original trie diagram inFigure 1 (see page 2, above).Within these subtries, all subtrie pointers should be initialized to NULL, because we will NOT producesub-subtries in this assignment!

2. Input Files and Output Format2.1. Command Line ArgumentsYour program will take two command line arguments, specifying two files to be read at runtime:./a.out corpus01.txt input01.txtThe first filename specifies a corpus that will be used to construct your trie and subtries. The secondfilename specifies an input file with strings to be processed based on the contents of your trie.For more information on processing command line arguments with argc and argv, see the instructionsfrom the Program #3 PDF. You can assume that I will always specify valid input files when I run yourprogram.2.2. Corpus File2.2.1 FormatThe corpus file contains a series of sentences. Each line contains a single sentence with at least 1 wordand no more than 20 words. Each word contains fewer than 1024 characters. Each sentence isterminated with a single period, and that period is always preceded by a space. For example:corpus01.txt:I like cherry pie and chocolate pie .2.2.2 Building the Main TrieFirst and foremost, each word from the corpus file should be inserted into your trie. If a word occursmultiple times in the corpus, you should increment count variables accordingly. For example, the trie inFigure 1 (see page 2, above) corresponds to the text given in the corpus01.txt file above.2.2.3 Building SubtriesFor each sentence in the corpus, update the co-occurrence subtrie for each word in that sentence. Thestructure of the co-occurrence subtries is described above in Section 1.2, Co-occurrence Subtries.If a string in the main trie does not co-occur with any other words, its subtrie pointer should be NULL.2.3. Input FileThe input file (the second filename specified as a command line argument) will contain any number oflines of text. Each line will contain either a single word, or an exclamation point (!). If you encounteran exclamation point, you should print the strings represented in your main trie in alphabetical order,with the occurrence count in parentheses.Otherwise, if you encounter a string, it will consist strictly of alphabetical characters. They may beupper- or lowercase, but they will not contain punctuation, spaces, numbers, or any other nonalphabeticalcharacters. The string will contain fewer than 1024 characters. Process these strings asfollows: If the string you encounter is in the trie, you should print the string, followed by a printout of itssubtrie contents, using the output format shown below. If the string you encounter is in the trie, but its subtrie is empty, you should print that string,followed on the next line by (EMPTY). If the string you encounter is not in the trie at all, you should print that string, followed on thenext line by (INVALID STRING).For example, the following corpus and input files would produce the following output:corpus01.txt:I like cherry pie and chocolate pie .input01.txt:!chocolaTEapricotprogram output:and (1)cherry (1)chocolate (1)i (1)like (1)pie (2)chocolaTE and (1) cherry (1) i (1) like (1) pie (2)apricot(INVALID STRING)For an example of what would cause an empty co-occurrence subtrie, consider the following inputfiles:corpus02.txt:spin straw to gold .spin all night long .spin spin spin .spindle .input02.txt:spinspindlenikstlitslepmurprogram output:spin all (1) gold (1) long (1) night (1) straw (1) to (1)spindle(EMPTY)nikstlitslepmur(INVALID STRING)You must follow the output format above precisely. Be sure to consult the included text files for furtherexamples.I have included some functions that will help you print the contents of your trie(s) in the requiredformat, because I think those functions are a bit too tricky to expect you to write them on your own.See printTrie.txt (attachment) for those functions. Youre welcome to copy and paste them intoyour Trie.c source file if you want to use them!2.4. Final Thoughts on Test Cases, Input, and OutputI have attached a few sample input and output files so you can check that your program is working asintended. Be sure to use diff on Eustis to make sure your output matches ours exactly. I alsoencourage you to develop your own test cases; ours are by no means comprehensive.3. Function Requirements3.1. Required FunctionYou have a lot of leeway with how to approach this assignment. Other than main(), there is only onerequired function. How you structure the rest of your program is up to you.TrieNode *buildTrie(char *filename);Description: filename is the name of a corpus text file to process. Open the file and create atrie (including all its appropriate subtries) as described above.Returns: The root of the new trie.3.2. Suggested FunctionsThese functions are not required, but I think they will simplify your task immensely if you implementthem properly and call them when processing your corpus/input files. Think of these functiondescriptions as hints at how to proceed. If you want, you can even implement these functions withdifferent parameters, return types, and so on.TrieNode *createTrieNode(void);Description: Dynamically allocate space for a new TrieNode struct. Initialize all the structmembers appropriately.Returns: A pointer to the new node.TrieNode *getNode(TrieNode *root, char *str);Description: Searches the trie for the specified string, str.Returns: If the string is represented in the trie, return its terminal node (the last node in thesequence, which represents that string). Otherwise, return NULL.void insertString(TrieNode *root, char *str);Description: Inserts the string str into the trie. Since it has no return value, it assumes the rootalready exists (i.e., root is not NULL). If str is already represented in the trie, simply incrementits count member.4. Compilation and Testing (Linux/Mac Command Line)To compile at the command line:gcc Trie.cBy default, this will produce an executable file called a.out that you can run by typing, e.g.:./a.out corpus01.txt input01.txtIf you want to name the executable something else, use:gcc Trie.c -o Trie.exeand then run the program using:./Trie.exe corpus01.txt input01.txtRunning the program could potentially dump a lot of output to the screen. If you want to redirect youroutput to a text file in Linux, its easy. Just run the program using the following:./Trie.exe corpus01.txt input01.txt whatever.txtThis will create a file called whatever.txt that contains the output from your program.Linux has a helpful command called diff for comparing the contents of two files, which is reallyhelpful here since weve provided sample output files. You can see whether your output matches oursexactly by typing, e.g.:diff whatever.txt output01.txtIf the contents of whatever.txt and output01.txt are exactly the same, diff wont have any output.It will just look like this:[email protected]:~$ diff whatever.txt output01.txt[email protected]:~$ _If the files differ, it will spit out some information about the lines that arent the same. For example:[email protected]:~$ diff whatever.txt output01.txt14c14< INVALID STRING(INVALID STRING)[email protected]:~$ _5. Grading Criteria and Submission5.1. DeliverablesSubmit a single source file, named Trie.c, through [email protected] The source file should containdefinitions for the required function (listed above), as well as any auxiliary functions you decide toimplement. Dont forget to #include Trie.h in your source code. We will compile your programusing:gcc Trie.cBe sure to include your name and PID as a comment at the top of your source file.5.2. Additional Restrictions: Use Tries; Do Not Use Global VariablesYou must use tries to receive credit for this assignment. Also, please do not use global variables in thisprogram. Doing so may result in a huge loss of points.5.3. GradingThe expected scoring breakdown for this programming assignment is:80% Correct Output for Test Cases20% Unit Testing of buildTrie() FunctionYour program must compile and run on Eustis to receive credit. Programs that do not compilewill receive an automatic zero.Your grade will be based primarily on your programs ability to compile and produce the exact outputexpected. Even minor deviations (such as capitalization, punctuation, or whitespace errors) in youroutput will cause your programs output to be marked as incorrect, resulting in severe point deductions.The same is true of how you name your functions and their parameters. Please be sure to follow allrequirements carefully and test your program thoroughly.For this program, we will also be unit testing your code. That means we will devise tests that determinenot only whether your programs overall output is correct, but also whether your buildTrie() functiondoes exactly what it is required to do. So, for example, if your program produces correct output butyour buildTrie() function is simply a skeleton that returns NULL no matter what parameters you passto it, or if it produces a totally funky, malformed trie, your program will fail the unit tests.5.4. Closing RemarksStart early. Work hard. Ask questions. Good luck!

5/5 – (1 vote)

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[Solved] Programming Assignment #4: Tries Solution
$25