The objective of this homework is to: A multithreaded C++ program (synchronization is not needed) Continue to gain familiarity with use of HTTP protocol for communication Continue to gain familiarity with I/O streams & string processing |
Problem summary
Develop a multithreaded C++ program that uses multiple threads to process a list of URLs specified as command-line arguments. For each URL, the program must print (in the same order in which URLs are specified) the number of words, and number of valid English words. A dictionary is supplied to determine valid English words.
Starter code
Starter code is provided to streamline the following operations for you. You should study these methods and be able to explain what these methods do
- Load valid English words from a given dictionary file (see: loadDictionary)
- Determine if a given word is a valid English word (see: isValidWord)
- A simple method to change punctuations and special characters to spaces to ease extracting/processing words in a line of data (see: changePunct)
Program Inputs
The program must use command-line arguments for obtaining inputs. The command line arguments will be in the following format:
- The first command-line argument will specify the number of threads to use. For base case requirements, the number of threads will be 1. For other cases the number of threads will vary, but will always be lower than the number of URLs
- Rest of the command-line arguments will be 1-or-more files to retrieve. Each file need to be added to a base URL
http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=. For example, given a file ex3.html, the full URL will be: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=ex3.html.
Program Outputs
The result of processing URLs should be displayed in exactly the same order in which the URLs were specified. For each URL, the program should generate 1 line of output, in the following
format: <URL>,words:<#Words>,Englishwords:<#EngWords>, where
is 1 blank space. Strings in blue are literal constants. See sample outputs for example.
Sample inputs and outputs
The command typed in is shown in bold. The commands and outputs are long because of URLs and have been wrapped in this document; however, they are on one single line. Video on setting command-line arguments in NetBeans is available on Canvas.
Base cases [12 points]
$ ./homework6 1 ex3.html
URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=ex3.html, words: 39, English words: 16
./homework6 1 ex3.html cpp.txt
URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=ex3.html, words: 39, English words: 16
URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=cpp.txt, words: 6983, English words: 5810
$ ./homework6 1 ex3.html cpp.txt us_constitution.txt miami_university.txt
URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=ex3.html, words: 39, English words: 16
URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=cpp.txt, words: 6983, English words: 5810
URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=us_constitution.txt, words: 7667, English words: 7422
URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=miami_university.txt, words: 32331, English words: 15348
! | The above 3 base case is relatively straightforward. Consequently, if the base case does not operate as expected, as per the course policy, the program will be assigned zero score. |
Multi threading tests (change number of threads, but exact same outputs) [18 points]
$ ./homework6 2 ex3.html cpp.txt us_constitution.txt miami_university.txt URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=ex3.html, words: 39, English words: 16URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=cpp.txt, words: 6983, English words: 5810URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=us_constitution.txt, words: 7667, English words: 7422URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=miami_university.txt, words: 32331, English words: 15348 |
$ ./homework6 3 ex3.html cpp.txt us_constitution.txt miami_university.txt URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=ex3.html, words: 39, English words: 16URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=cpp.txt, words: 6983, English words: 5810URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=us_constitution.txt, words: 7667, English words: 7422URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=miami_university.txt, words: 32331, English words: 15348 |
$ ./homework6 4 ex3.html cpp.txt us_constitution.txt miami_university.txt URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=ex3.html, words: 39, English words: 16URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=cpp.txt, words: 6983, English words: 5810URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=us_constitution.txt, words: 7667, English words: 7422URL: http://ceclnx01.cec.miamioh.edu/~raodm/SlowGet.cgi?file=miami_university.txt, words: 32331, English words: 15348 |
Tips
- It is easiest to store the files specified as command-line arguments into a vector to further process them.
- Use the simple HTTP client example from lecture slides to obtain response from the server.
- Extracting words and counting valid English is straightforward using an istringstream. Dont over complicate this part.
- Approach this problem as a data parallel application in which a subset of URLs is assigned to a thread to process. The logic of assigning threads to URLs is almost a copypaste from lecture slides.
- In continuation with examples from lecture slides, it would be easiest to store results as a string in another vector and then finally print the vector after all the threads have finished running.
- If you find your program running too slowly in NetBeans you can use the Release mode (instead of Debug)
Reviews
There are no reviews yet.