CSE 216
SyllabusLecturesRecitationsAssignmentsTeaching Assistants
Assignment 4: Concurrent Sorting, CREW, and Unit Testing
In this assignment, you will be using Java JDK 1.8 with multithreading. There are two parts to this assignment.
I. Concurrent Sorting 45 points
The first part of this assignment it to write a concurrent mergesort algorithm. The idea here is simple: use the standard mergesort algorithm, but after splitting the input array into two halves, each half will be recursively sorted in its own thread. To do this, the Runnable implementation is given to you here DO NOT change this class as the Sorting class.
public class Sorting implements Runnable
private int arr;
private int threadCount;
public Sortingint arr, int threadCount
this.arrarr;
this.threadCountthreadCount;
public void run
MergeSort.concurrentMergeSortarr, threadCount;
In the implementation, you should use divideandconquer concurrency. That is, each recursive call with onehalf of the array spawns its own thread, and each thread merges the two child threads it spawned when merging. The algorithm must be an inplace mergesort algorithm. For this, you may have to write helper functions like swap, etc. The details of such choices are left to the programmer. Some code is given below to help you test your own code. This includes a method to randomly generate very large integer arrays, and the main method that runs the static concurrentMergeSortint method and prints out the amount of time it takes to run the algorithm. Your final code must run with this main method!
Note that there are some differences as well as some similarities in the methods used in Sorting and in MergeSort. This is intentional, and you must use various aspects of objectoriented programming to understand why these differencessimilarities exist, and how to work with them. You may also find the following method useful when it comes to determining how many threads are likely to be the most useful when running a parallel algorithm in your machine:
Runtime.getRuntime.availableProcessors
You should check that increasing the number of threads up to a certain point depending on the number of processor cores in your machine gives your concurrent mergesort program a significant speed boost. If not, chances are that your code either has a bug, or you are not really running it with enough concurrency. For example, some numbers that you should approximately expect when you test your code with the given main method with different LENGTH values are shown below.
public class MergeSort
private static final Random RNG new Random10982755L;
private static final intLENGTH524288;
public static void mainString args
int arrrandomIntArray;
long startSystem.currentTimeMillis;
concurrentMergeSortarr;
long endSystem.currentTimeMillis;
if !sorteda
System.err.printlnThe final array is not sorted;
System.exit0;
System.out.printf10d numbers: 6d msn, LENGTH, endstart;
private static int randomIntArray
int arrnew intLENGTH;
for int i0; iarr.length; i
arriRNG.nextIntLENGTH10;
return arr;
public static boolean sortedint arr
return !IntStream.range1, arr.length
.mapToObjiarri1arri
.findFirst.orElsefalse;
One thread:
1024000 numbers:191 ms
2048000 numbers:380 ms
8192000 numbers: 1638 ms
Two threads:
1024000 numbers:112 ms
2048000 numbers:252 ms
8192000 numbers:953 ms
Four threads:
1024000 numbers: 83 ms
2048000 numbers:167 ms
8192000 numbers:672 ms
II. Unit Testing 10 points
The helper method sortedint given above is not tested. It is a part of your task to create a class called MergeSortTest. In this class, you have to test only the sortedint method using JUnit 5. If your initial tests reveal that this method has a bug, you are free to fix the bug andor completely change the internal working of the method. The final submission must include a bugfree sortedint method and a test suite that runs successfully.
III. Concurrent Read Exclusive Write 45 points
Concurrentread exclusivewrite CREW is a fundamental concurrent programming paradigm. As the name suggests, it is about implementing a system that allows for multiple threads or processes to read a shared object, but only allows one thread or process to write to a shared object at any given moment. In this part of the assignment, your task is to write a class called WordCounter, whose work is described below:
1.This class will read a list of plain text files, all of which are in a given folder.
2.Based on the word counts, it will generate a table that shows which word appears how many times and in which file.
Let us look at a worked out example. Suppose there are two text files:
texta.txt: There are so many words in this file. Well, actually not that many if you really start counting them.
textb.txt: The words are scattered everywhere, and in no order whatsoever. What. So. Ever. How will counting them help?
textc.txt: So, what can you do?
The output table is given here:
Note the following features of this output:
the words are alphabetically ordered
the first columns width is the longest words length plus one character so that theres a gap between the longest word and the first number
all the columns specific to a file have the same width, and counts the number of occurrences of a word in that file
the last column counts the total number of occurrences of a word across all files
all words are lowercased
punctuation has been completely ignored
The WordCounter file must generate this output for the set of all files in a specific folder. To accomplish this, it must read each text file in a separate thread. Of course, all the words and wordcounts from individual files have to consolidated to write the final table. This is where you must carefully put together the results obtained by the different threads.
As with the mergesort algorithm, you should test this code without multithreading i.e., the number of threads is just onethe main thread and then with multithreading, to see what kind of speed boost if any you are getting.
The basic structure of WordCounter is given below. Your class must have the constants FOLDEROFTEXTFILES, WORDCOUNTTABLEFILE, and NUMBEROFTHREADS. These are the only values we will change to run your code, so you have to be absolutely sure that your code is modular and robust enough to not crash if these details are changed, and the code is rub on another machine.
textatextbtextctotal
actually 1 0 0 1
and0 1 0 1
are1 1 0 2
can0 0 1 1
counting 1 1 0 2
do 0 0 1 1
ever 0 1 0 1
everywhere 0 1 0 1
file 1 0 0 1
help 0 1 0 1
how0 1 0 1
if 1 0 0 1
in 1 1 0 1
many 2 0 0 2
no 0 1 0 1
not1 0 0 1
order0 1 0 1
really 1 0 0 1
scattered0 1 0 1
so 1 1 1 3
start1 0 0 1
that 1 0 0 1
the0 1 0 1
them 1 1 0 2
there1 0 0 1
this 1 0 0 1
well 1 0 0 1
what 0 1 1 2
whatsoever 0 1 0 1
will 0 1 0 1
words1 1 0 2
you1 0 1 2
public static class WordCounter
The following are the ONLY variables we will modify for grading.
The rest of your code must run with no changes.
public static final Path FOLDEROFTEXTFILES Paths.get;path to the folder where input text files are located
public static final Path WORDCOUNTTABLEFILEPaths.get;path to the output plaintext .txt file
public static final intNUMBEROFTHREADS2; max. number of threads to spawn
public static void mainString args
your implementation of how to run the WordCounter as a standalone multithreaded program
NOTE:
You are not allowed to use the parallelStream method for this assignment
As with the previous assignment, you must use JDK 1.8.
For the unit testing portion, you must use JUnit 5.
Rubric
1.Correctness of the concurrent mergesort algorithm implementation: 30 points
2.With an int of size 524288, going from one thread to two threads gives a speedboost of at least 20: 5 points
3.Similarly, going from one thread to four threads gives a speedboost of at least 40: 5 points
4.Even more significant speed improvement with larger arrays e.g., of 1024000 elements: 5 points
5.Unit tests for the sortedint method follow proper naming conventions: 2 points
6.Unit tests include at least two distinct boundary value conditions: 2 points
7.Unit tests include at least two distinct test cases that are false i.e., if the array is not sorted, the test should detect so: 3 points
8.Unit tests include at least two distinct test cases that are true i.e., if the array is sorted, the test should detect so: 3 points
9.WordCounter table generation follows the format and all columns are wellaligned: 5 points
10.Words are counted after discounting uppercaselowercase differences: 5 points
11.Punctuation is filtered out, and does not appear in the output table: 5 points
12.WordCounter results are correct in the singlethreaded scenario: 5 points
13.WordCounter results are correct in multithreaded scenarios: 10 points
14.Results are correct with a large number of files50 in the folder: 5 points
15.Results are correct with a large number of threads10: 5 points
16.Speed improvement even if not very significant with multithreading: 5 points
What to submit?
A single .zip archive with the following files:
1.Sorting.java even though you are not going to change it
2.MergeSort.java
3.MergeSortTest.java
4.WordCounter.java
2019 Ritwik Banerjee
Reviews
There are no reviews yet.