2016/8/8 ECS60 Assignment 2
ECS 60, Summer Session 2016
Assignment 2: Secret Decoder
You must do this assignment on your own. Hand in one submission per team. Submit your work using: handin cs60 p2 <filename>
The handout code for the assignment is in ~neff/60/p2 (on my CSIF Unix account).
Introduction
In your second assignment, you will implement a secret code generator and decoder. The system works by building an encoding tree based on two sets of input from the user: an ordered alphabet containing all the symbols that will be encoded and a set of mutations that are used to change the structure of the coding tree. Given these two pieces of information, the system can build an encoding tree. This tree can be used to convert plain text sentences, such as meet me at the clock tower at midnight., into a secret pattern of 1s and 0s that could be transmitted to someone else. Anyone who intercepts the message will have great difficulty figuring out what it says without the encoding rules. However, if the recipient has your software and has the two keys, the alphabet and the mutation list, she can build the encoding tree and use it to decode the string of 1s and 0s back to plain text. The rendezvous is on schedule!
This assignment will give you experience with trees and string manipulation. It will also give you practice with careful reference management, debugging and let you impress your friends with cool encryption software. (No claim is made about the sophistication of this encryption technique. If you study more CS, you may later want to evaluate the effectiveness of the encryption scheme.)
Background
The encoding/decoding algorithm builds on two ideas, Huffman trees and tree rotation, but uses them for different purposes than they are normally used. You may find it useful to read the section on Huffman codes and review the rotation material in the textbook. That said, there should be enough information in this handout to complete the assignment.
Huffman coding is a compression technique (It is part of the JPEG encoding process, among other uses.). It seeks to use fewer bits to represent the same amount of information. For example, if you analyze the frequency of letters in a piece of text, you will notice that some letters occur much more frequently than others. To transmit these letters in digital form, they need to be converted to a series of 1s and 0s. The standard way to do that is to use the same number of bits, say 7, for each character. This means the total number of bits needed for your message is the number of characters in it (n), times seven: bits = 7*n. This can be improved, though, if you can come up with an encoding that takes advantage of the varying frequency of different letters. Say that your text contains 15400 instances of the letter e, but only 670 instances of the letter q. In the default decoding the number of bits required will be:
bits = (15400 + 670) * 7 = 112490
Now imagine if you could come up with an encoding that used only 6 bits for the e, but 8 bits for the q. The total number of bits would be:
bits = 15400 * 6 + 670 * 8 = 97760
which is over 10% more efficient, with no loss of information. The tricky part is coming up with an encoding that will allow you to use different numbers of bits for different characters in an unambiguous way.
In this assignment, we are not interested in compression. In fact, our encoding may take more bits than the default encoding, but hopefully it will be harder to decode. What we do share with Huffman coding is the need
file:///C:/Users/Zeqing/Desktop/ECS60%20Assignment%202.html
1/7
2016/8/8 ECS60 Assignment 2
to represent characters in an unambiguous way and we will borrow a technique from the field in order to do this. If you have a string of 1s and 0s and you do not know how many bytes are used to encode a character, it can be difficult to tell when one character ends and the other begins. Consider for example if we encode our e using three bits as 011 and our q using four bits as 0110. If you are reading in a stream of bits and read 011, there is no way to tell if this is supposed to be the letter e or the first three bits of the letter q. The situation is ambiguous. We cannot guarantee that an ambiguous encoding will be decoded correctly, so we must avoid ambiguity. We can do this by introducing the following rule: The encoding of one character cannot be the prefix of the encoding of another character. This is the situation that led to the ambiguity above, the encoding of e was a prefix of the encoding of q, and we must avoid that.
One way to develop such a coding is to use a special kind of tree, called a Huffman tree. A Huffman tree is a binary tree in which a left branch represents a 0 and a right branch represents a 1. The letters are stored at the leaves of the tree. The encoding for any given letter is the pattern of 1s and 0s encountered by traversing from the root to the leaf that contains the given letter. An example is shown below (Figure 1). The encoding of the letter e is 011. The encoding for the letter b is 10. Satisfy yourself that the prefix rule is met using this method.
The second background concept this assignment relies on is the idea of tree rotation, which is normally used to help balance trees. This ensures that they are of minimum height and thus fast to search. We will be using rotation for a different purpose, to scramble our encoding, but the definition of rotation remains the same.
A tree rotation adjusts the tree around a particular node and must maintain the correct ordering of items required by a binary search tree. Here we will only use single rotations. There are two types of single rotations, a left rotation and a right rotation. The two types of rotation are symmetric i.e. the definition of the right rotation is the same as the left, you just switch all the left and right terms. You will need to implement both, but due to this symmetry, we will only develop the definition of a left rotation.
A simple example of a left rotation around node 265 is shown below (Figure 2):
Notice that 275 shifts up to the original position of 265 and 265 becomes the left child of 275. This hopefully illustrates why the transformation is called a rotation. A more complicated rotation is shown in the next two figures. This is the base tree (Figure 3):
file:///C:/Users/Zeqing/Desktop/ECS60%20Assignment%202.html
2/7
2016/8/8 ECS60 Assignment 2
And this is the tree after a rotation around node 265 (Figure 4):
The situation here is slightly more complicated as an additional manipulation is required to maintain the correct order for a binary search tree. The left subtree of node 275 before the rotation becomes the right subtree of node 265 after the rotation. This is the other transformation that must be preformed as part of a left rotation. If the rotation is around element x whose right child is element y, after the rotation the left subtree of y becomes the right subtree of x.
You can notice that the rotation changes the coding of the letters. For example, c was 001 before the rotation but is 0001 after the rotation. Some rotations, such as a second left rotation around 265, will make leaf nodes internal nodes. This violates our prefix rule and we will never specify such rotations when testing your software.
Notes:
- If a tree is a valid binary search tree before a rotation, it will be a valid BST after a rotation.
- No new nodes are being created and the data contained by the nodes does not change. We are simplychanging node pointers (which nodes point to which).
- Any nodes higher up in the tree than the point of rotation are not affected by the rotation.
- When implementing the rotation, be very careful to maintain the correct pointers. In particular, remember
that parent pointers must be set and rotations around the root will change the root of the tree.
file:///C:/Users/Zeqing/Desktop/ECS60%20Assignment%202.html
3/7
2016/8/8 ECS60 Assignment 2
The Implementation
An interface defines the public behavior of a class without revealing any of the underlying details. Java has a keyword interface that allows an interface to be directly defined. C++ does not. In this assignment, we provide an interface that is created as a class of pure abstract methods.
You only need to implement one class for this assignment! It is a bit of a complicated one though, so it may take you some time. You need to implement the class SecretDecoderTree which implements the interface DecoderTree. Copy the starter code for the assignment and have a look at the interface. You will notice that it contains quite a few methods. As noted in it, the first three methods define the core interface to your decoder. The idea is that one user could call BuildTree() with a given alphabet and mutation list and then call Encode() to scramble a plain text message. They then send the resultant string of ones and zeroes to a second user. That user, with a separate copy of your code, could call BuildTree() with the same alphabet and mutation list. They could then call Decode() on the string of ones and zeroes they were sent to decipher the message.
The interface should give you a good idea of what is expected and those details will not be repeated here. Some additional information is required though. First, a few general notes. The DecoderTree is a binary search tree. Note that the BTNode class also contains a reference to its parent. You will need this as you will want to work from a leaf node up to the root in order to determine the coding for a particular letter. There are also two types of data that you might want to store at a node: every node will need a label (a key) which the tree is sorted on and each leaf node will also hold the letter that is encoded by the path from the root to it. The last two figures above show this data, with the labels being shown in black and the letters in red. Well use integer labels. They are used simply for ordering the tree and the rules used for determining these labels are discussed below. Note: you do not need to store the 1s and 0s associated with the edges because you know left edges always correspond to 0 and right edges to 1. I have included a version of the BTNode class which includes all the data you will need at each node. Use this class to build your tree.
Building the tree
The most complicated part of the assignment is building the tree. There are two phases to this process. First, you need to build the tree for the given alphabet. You then mutate the tree by performing a series of rotate left and right operations on various nodes in it. Use helper methods in the BuildTree() to accomplish the two steps. Methods that are too long are difficult to understand and maintain, so it is a good design practice to divide complicated functionality like this into smaller problems that you can solve in separate methods.
Each letter in the alphabet will form one leaf in the tree. The initial tree is built from the bottom up, by first laying down the leaves and then connecting them up until you reach the root. For this decoder, the alphabet will always be a String containing 30 unique characters. Since this is a binary tree, the maximum number of nodes at any level will be 2^n. In order to include 30 characters, the tree will need to have depth 5 so that the bottom level can hold 32 leaves. The first 28 characters of the alphabet will form the first 28 leaves at this level. The last two letters will be stored as the last two leaves at the next level up, the level of 16.
To make this more concrete, consider the base tree shown above in Figure 3. This is a subtree of an encoding tree that shows the last 6 characters of the alphabet. a, c, f and e are at the lowest level of the tree, whereas b and d are at one level above. The tree is built by first laying these leaf nodes down. Then the pairs at the lowest level are connected (e.g. a and c are connected and f and e are connected). Then the nodes at the next level are connected and so on progressing up to the root. This process is shown below. Note that this is only part of the full tree.
file:///C:/Users/Zeqing/Desktop/ECS60%20Assignment%202.html
4/7
2016/8/8 ECS60 Assignment 2
Since we are implementing a binary search tree and we want to be able to refer to particular nodes to do mutations, we need to specify a label for each node that can be used for searching it. Well use a simple rule for determining these labels. For the first 28 leaves, the label will be the index of the alphabet entry, starting at 1, times 10, so these labels will form the sequence <10, 20, 30, , 270, 280>. The last two labels for the leaves at the level of 16 will be 300 and 310 respectively. The label for each parent will be the average of the labels of the two children.
Once the tree has been built, you need to mutate it by performing a series of rotations. The mutations are also specified in a string with each mutation consisting of 4 characters of the form xxxY where xxx is the number of the node to rotate and Y is either R for a right rotation or L for a left rotation. If the node label has less than three digits, it will be padded with zeroes. The string 265L045R specifies a Left rotation around node 265 followed by a Right rotation around node 45. Once all the rotations have been applied, the tree is complete and can be used for encoding and decoding.
file:///C:/Users/Zeqing/Desktop/ECS60%20Assignment%202.html
5/7
2016/8/8 ECS60 Assignment 2
You can assume that the alphabet string will be valid, without any errors. The mutation string might refer to nodes that do not exist. Your implementation should deal with this gracefully.
Encode
See the interface. The encode method should make use of the GetCode method.
Decode
See the interface. The sequence of 1s and 0s specify paths in the tree. Starting at the root, each one or zero indicates whether you go the left or right child. This continues until you reach a leaf. The character at the leaf should be concatenated to the decoded text and the method should return to the root, tracing down again to the next letter.
Rotate left and right
The RotateLeft and RotateRight methods implement the rotation operations described above. These methods should be used during tree building. Both of these methods have the precondition that the node label passed in is actually in the tree. If this precondition is violated in either method, the method should print out an error message and continue processing the rest of the rotations.
GetCode
See the interface. This method should be used in the encoding process. For a given letter, you can trace up from its leaf node to the root of the tree. This will give the encoding of the character in reverse order (youll need it in the correct order).
Hint: Keep a list of all the leaf nodes and the letters they contain. You can search the list to find the leaf node you need to start the encoding process.
GetNode
See the interface. This is a helper method that you can use to find the node you will be performing a rotation operation on.
Print routines
See the interface. Use recursion for these methods.
Note that both methods return the string as well as printing it out. This will make it easier to test your code. The formatting rule for the two functions are that you should print a space, then the data item you need to print and then another space. This means that when you have a series of data items, there will be two spaces between them. For example, the String returned by InorderPrintLeafLetters() for the example in the driver should be:
fjkghibalsmrnopqz .tuvwxy!?cde
InorderPrintLeafLetters() should print the letters at the leafs and InorderPrint should print the labels at each node.
Driver
A simple driver routine has been provided that shows some basic uses of the code (main.cpp). When your implementation is working, the driver will output meaningful text.
What you need to submit
You need to submit SecretDecoder.cpp, SecretDecoder.h and your test code, Test.cpp and Test.h.
file:///C:/Users/Zeqing/Desktop/ECS60%20Assignment%202.html
6/7
2016/8/8 ECS60 Assignment 2
Testing
You must prepare and submit test code for all the public methods in the class that you write. Testing is an important part of writing robust code. Be sure that your test cases are clearly labeled and understandable both to the TA and anyone else who may read your code. You should create a new class CTest (in the files Test.h and Test.cpp) that conains a public method: void RunTests(). This method should fully exercise your code.
Make sure that you use appropriate documentation throughout your code.
Marking breakdown
10% Documentation
15% Implementation and design 15% Test cases
60% Autotesting
file:///C:/Users/Zeqing/Desktop/ECS60%20Assignment%202.html
7/7
Reviews
There are no reviews yet.