1 Histogram of Word Lengths
Write a C program that reads from stdin till EOF and analyzes the lengths of the words in the input. Lets consider all alphanumeric characters as words, and all non-alphanumeric characters as delimiters. For example, each of the following is a word.
- Homework
- CS240
- 2
The following strings should be broken into multiple words.
- youll has two words: you and ll.
- ALLS has two words: ALL and S.
- UTF-8 has two words: UTF and 8.
- 2/19/2019 17:00 has five words: 2, 19, 2019, 17, and 00.
- www.gutenberg.org has three words: www, gutenberg, and org.
You can build upon the word counting code on page 20 of K&R. As you read a word one character at a time, keep track the number of characters you have read. When you reach the end of a word, you have its length. Then you increment a counter that keeps track the number of words of this particular length. Use an array of these counters. I will test your code with CompleteShakespeare.txt. The longest word you will encounter has 27 characters.
For output, you should print 27 lines. In each line, you print the length (width 2), a space, the number of words of that length (width 6), a space, and several asterisks. Use one asterisk for each 4,000 words. If there are fewer than 4,000, you still print one asterisk for them, because we cannot print a fractional asterisk. For example, print one asterisk for 1 to 4,000 words, and two asterisks for 4,001 to 8,000 words, and so on. The asterisks constitute the histogram of word lengths. Histograms are usually printed vertically, but here it is printed horizontally because this is easier. On CompleteShakespeare.txt, your code should print exactly like Figure 1.
- 63691 ****************
- 166375 ******************************************
- 204211 ****************************************************
- 223161 ********************************************************
- 121472 *******************************
- 80386 *********************
- 59379 ***************
- 35083 *********
- 20351 ******
- 10067 ***
- 3771 *
- 1353 *
- 454 *
- 247 *
- 77 *
- 3 *
- 4 *
- 0
- 0
- 0
- 0
- 0
- 0
- 0
- 0
- 0
- 1 *
Figure 1: Output for CompleteShakespeare.txt
2 Lastly
Compile and run as follows:
[email protected]:~/$gcc -Wall histo.c -o histo [email protected]:~/$./histo < CompleteShakespeare.txt
Write plenty of comments to explain your code how you determine a character is alphanumeric, how you extract one word, and how you convert a count to the number of asterisks to print.
Write a Report.txt that discusses what you found difficult about this assignment, how you planned your approach to it, and what you learned completing it.
Send me only:
- c
- txt
Reviews
There are no reviews yet.