You must write a program which reads, processes and reports on the contents of a text file.
Your program should:
- Read the name of the text file from the console.
- Read in a text file, not all at once. (This can be line by line, word by word or character by character.)
- The file content must be converted to a sequence of words, discarding punctuation and folding all letters into lower case.
- Store the unique words and maintain a count of each different word.
- The words should be ordered by decreasing count and, if there are multiple words with the same count, alphabetically. (This ordering may be achieved as the words are read in, partially as the words are read or at the end of all input processing.)
- Output the first ten words in the sorted list, along with their counts.
- Output the last ten words in the list, along with their counts.
You must choose appropriate data structures and algorithms to accomplish this task.
Note: in the context of this assignment, appropriate choices will be efficient and will not use excessive instructions or data.
Note: where a punctuation mark appears between two letters, the sequence is to be treated as a single word. Thus, its will become its, youll will become youll and loop-hole will become loophole.
Note: you can assume that the input file contains no more than 50,000 different words.
Note: a small sample input file sample.txt is provided for you to test your program.
A larger text file will be used for final assessment.
Note: you may use any data structures or algorithms that have been presented in class up to the end of week 4. If you use other data structures or algorithms appropriate references must be provided.
Programs must compile and run under gcc (C programs), g++ (C++ programs), java or python. Programs which do not compile and run will receive no marks.
Programs should be appropriately documented with comments.
All coding must be your own work.
Standard libraries of data structures and algorithms such as STL may not be used.
Code be sourced from textbooks, the internet, etc may also not be unless it is correctly credited. In the event that you use code sourced in this way you will not receive marks for that part of the program.
Reviews
There are no reviews yet.