In this lab were making hashtables in C. Weve been using C arrays for the past two labs and now that Lab 4 has given us a working linked list in C, we can implement a hash table.
You will be doing the following:
- TODO1 implement the hash table
- (EXTRA CREDIT) TODO2 No memory leaks
valgrind will display the following message:
All heap blocks were freed no leaks are possible
Hash Table Refresher
Put simply, a hash table is just an array that is able to store data associated with a key. In Assignment 4 we were storing data using a String for our keys and an Object for our value, which would eventually be a list. The reason we want to make an array of lists instead of just using an array is it is much more efficient with storage space since we dont need to make space in an array for every possible hash value. Additionally we dont have to worry about running out of space in the array to deal with collisions , two keys being turned into the same array index.
Recall that in Assignment 4 we needed to call the Java function hashCode() on our String key and then turned the hash code (named hashCode in the PDF) into an index into out hash table (making it between 0 and tableSize 1.
String key = hello; int hashCode = key.hashCode();
int arrayIndex = Math.abs(hashCode) % tableSize;
At a high level this code is converting a String into a valid array index. What hashCode() is doing behind the scenes involves, at least in C, getting a bit closer to the actual data representation in memory.
Hashing in C
Our hash() will actually be made up of three functions:
- int hash(Dictionary D, char* key)
- The function that will return an array index
- unsigned int pre_hash(char* input)
- This function converts a string into an unsigned int
unsigned int rotate_left(unsigned int value, int shift)
The function responsible for moving bits to semi-randomize the resulting key
Note: You will only need to call hash(Dictionary D, char* key) when you want to compute a hash value.
These functions, included in the Dictionary.c file are defined as follows:
unsigned int rotate_left(unsigned int value, int shift) { int sizeInBits = 8 * sizeof(unsigned int); shift = shift & (sizeInBits 1); if(shift == 0){return value;}return (value << shift) | (value >> (sizeInBits shift));} |
unsigned int pre_hash(char* input) { unsigned int result = 0xBAE86554 //this may correspond to anaddress while (*input) { result ^= *input++; result = rotate_left(result, 5);}return result;} |
int hash(Dictionary D, char* key) { return pre_hash(key ) % D->tableSize; }
Note: You will only need to call hash(Dictionary D, char* key) when you want to compute a hash value.
A few things to note about these functions are that they are manipulating the actual bits of our data in memory, specifically the &, <<, >>, and ^= operators which are defined as follows:
- & is bitwise AND
- Instead of looking at a truth value it looks at the binary representations of its arguments and creates a new binary number using a logical AND for each bit (if you have taken CE12 you will be familiar with this).
- << and >> are bit shifts
- All data is represented as some sequence of bits in memory. Doing a left or right shift moves the bit sequence stored in a given variable (on the left) the specified number of bits (on the right) and fills in 0s when there is no more of the original value to shift into a bit.
- ^= is a bitwise exclusive or assignment
- This operator performs the XOR operation between its two arguments and assigns the result to the left hand side.
Wikipedia has a more in depth explanations of bitwise operators if youre curious (https://en.wikipedia.org/wiki/Bitwise_operations_in_C)
We have also included a program called HashDemo.c to show how the above three functions work. To compile it, simply enter the following:
bash$ gcc -std=c99 -o HashDemo HashDemo.c
To run the program:
./HashDemo
It will show you a breakdown of strings and their indices as created by the above functions, the functionality of rotate_left() on a single unsigned int, and the number of bits a single character takes up.
A Note on typedefs
In this lab well also be seeing more of these statements and its one of the built in abstractions in C. Weve done typedef struct node_type {} Node; to allow us to refer to a Node without having to repeatedly say struct. Inside of Dictionary.h and Dictionary.c are two other typedefs, one for a struct DictionaryObj* and one for a struct EntryObj*. These allow you to refer to pointers by whatever name was given after the typedef, in this case Dictionary is another name for a pointer to a DictionaryObj and Entry is another name for a pointer to an EntryObj.
TODO1: Implementing the Hash Table
Similar to Assignment 4 youll be implementing something resembling the MyHashtable class but not quite because, once again, C does not have classes. The basic blueprint for our C version should look very familiar:
- A struct defined in the Dictionary.c called Entry (a lot like the Node struct you implemented in Lab 4). It will once again store Key/Value pairs:
char* key
char* value
As before it should have an associated Entry newEntry(char* key, char* value ) function
As were in C it should also have an associated void freeEntry(Entry* pE) function (more details about this in TODO2).
- Again like in Assignment 4 your hash table struct will have properties that arent directly accessible outside of the c file:
- int tableSize the size of the array being used by the hash table
int size the number of key/value entries currently in the hash table
List** table an array of pointers to Lists. As in Assignment 4, the reason for this is to handle cases where two keys have the same hash value.
- Youll be implementing the following functions in c:
- Dictionary newDictionary ( int tableSize)
Allocates space for a new Dictionary, initializes the new Dictionarys variables, and returns it
void freeDictionary(Dictionary* pD)
Frees all data associated with the Dictionary pointed at by pD (more detail about this in TODO2)
int isEmpty(Dictionary D)
Returns an int; booleans in C are integers. Traditionally a 1 represents true and 0 represents false
int size ( Dictionary D)
Returns the number of key/value pairs in D
void insert (Dictionary D, char* key, char* value)
Adds a new key/value pair into the dictionary using a linked list to deal with collisions (translating Assignment 4s put() is a good place to start)
char* lookup(Dictionary D, char* key)
Returns the value in Dictionary D associated with key
(translating Assignment 4s get() is a good place to start)
void delete(Dictionary D, char* key)
Removes the Entry associated with key in Dictionary D
(translating Assignment 4s remove() is a good place to start)
Remember that because were in C we need to free whatever was deleted here.
void makeEmpty(Dictionary D)
Removes all Entries from Dictionary D (translating Assignment 4s clear() is a good place to start)
Remember that because were in C we need to free memory were not using anymore ourselves
void printDictionary(FILE* out, Dictionary D)
Prints the contents of Dictionary D to a file. More on this below.
Each of the above functions has a brief description in Dictionary.h including specifics about return values and preconditions (what each function assumes to be true when its called).
Making an Array of Lists and Notes About calloc
Because were in C, we dont get to use the nice Java array syntax:
MyLinkedList[] table = new MyLinkedList[tableSize];
instead we need to do what we did in Lab 3 and use calloc(): First we need a pointer to a List:
List** table
- Next, inside newDictionary() well need to set our newly instantiated Dictionarys table pointer to point to something on the heap:
calloc(tableSize, sizeof(List*))
The reason we need table to be of type List** is because its an array of pointers to Lists. Even though table looks a little strange compared to a Java array declaration and initialization, we can still access whatever is in it using the same array indexing weve been working with (assume D is of type Dictionary):
D->table[0] = make_list();
To make a new instance of a DictionaryObj we still need to call malloc(), which requires a pointer:
Dictionary D = malloc(sizeof(DictionaryObj));
Note: One important thing to remember about sizeof() is you want to give it the struct not the pointer. So when we want to make a new dictionary, we dont want to give sizeof() Dictionary , we want to give it DictionaryObj because remember that Dictionary is really just another name for a DictionaryObj*.
A note on printDictionary()
The function void printDictionary(FILE* out, Dictionary D) is a little more flexible than what weve been doing in C up to this point since its able to take any arbitrary file and write to it. Instead of using printf() which only lets us print to stdout, we can use fprintf() which functions almost identically to printf () with one difference. A general call to printf looks like:
printf(<format string>, <argument1>, <argument2>,, <argumentn>);
while a general call to fprintf looks like:
fprintf(<file name>, <format string>, <argument1>,, <argumentn>);
Similar to Lab 2 in Java, C treats standard in (stdin) and standard out (stdout ) as files and youll notice that the calls to printDictionary() in DictionaryClient.c look like this:
printDictionary(stdout, A);
Compiling and Running
As in Lab 4, we have a few different files we need to compile at once to have our object file actually work as intended. We also now have two different files for you to use to test your hash table: DictionaryClient.c and DictionaryClient2.c . To compile them youll need to run the following:
bash$ gcc DictionaryClient. c Dictionary.c list.c std=c99 -o DictionaryClient
to make the object file for DictionaryClient and
bash$ gcc DictionaryClient2.c Dictionary.c list.c std=c99 -o DictionaryClient2
To make the object file for DictionaryClient2 .
DictionaryClient is fairly simple and should print the following to the console:
two bar five happy one foo three blah seven blue four galumph six sad
key=one value=foo key=two value=bar key=three value=blah key=four value=galumph key=five value=happy key=six value=sad key=seven value=blue two bar five happy four galumph six sad key=one not found (null) key=two value=bar key=three not found (null) key=four value=galumph key=five value=happy key=six value=sad key=seven not found (null) false 4 true
DictionaryClient2 does something a little more fun and loads each line of its own .c
file into the hash table and prints the hashtable to a file named DictionaryClient2-out, which should have the following contents:
line 53: | key = &keyBuffer[keyBufferOffset]; |
line 36: | value = &valBuffer[valBufferOffset]; |
line 63: | freeDictionary(&A); |
line 18: | char* value; |
line 45: | key = &keyBuffer[keyBufferOffset]; |
line 11: | #define MAX_LEN 180 |
line 28: | while( fgets(line, MAX_LEN, in)!=NULL ){ |
line 55: | insert(A, key, value); |
line 21: | int keyBufferOffset = 0, valBufferOffset = 0; |
line 38: | valBufferOffset = valBufferLength; |
line 9: | #includeDictionary.h |
line 65: free(valBuffer); line 47: keyBufferOffset = keyBufferLength; line 31: lineNumber++; line 2: // DictionaryClient2.cline 13: int main(int argc, char* argv[]){ line 40: // put label in keyBuffer line 57: valBufferOffset += (strlen(value) + 1); line 23: char line[MAX_LEN+1];line 50: // put keys and values in dictionary A line 67: fclose(out); line 49: line 33: line[lineLength] = ; // overwrite newline with null line 60: printDictionary(out, A);line 15: FILE* in = fopen(DictionaryClient2.c, r); line 42: labelLength = strlen(label); line 59: line 5: line 70: }line 25: int i, labelLength, lineLength, lineNumber = 0; line 52: for(i=0; i<lineNumber; i++){ line 69: return(EXIT_SUCCESS);line 35: valBuffer = realloc(valBuffer, valBufferLength*sizeof(char) );line 62: // free memory and close files line 17: char* key;line 44: keyBuffer = realloc(keyBuffer, keyBufferLength*sizeof(char) ); line 7: #include<stdlib.h> line 10:line 27: // read input files line 54: value = &valBuffer[valBufferOffset]; line 20: char* valBuffer = NULL; line 37: strcpy(value, line); line 8: #include<string.h> line 64: free(keyBuffer); line 19: char* keyBuffer = NULL; line 46: strcpy(key, label); |
line 30: // put line in valBuffer line 1://– line 12: line 29:line 56: keyBufferOffset += (strlen(key) + 1); line 22: int keyBufferLength = 0, valBufferLength = 0; line 39:line 66: fclose(in); line 48: }line 32: lineLength = strlen(line)-1; line 3: // Another test client for the Dictionary ADT line 14: Dictionary A = newDictionary(); line 41: sprintf(label, line %d:t, lineNumber); line 58: } line 4://–line 24: char label[MAX_LEN+1];line 51: keyBufferOffset = valBufferOffset = 0; line 68:line 34: valBufferLength += (lineLength+1); line 61:line 16: FILE* out = fopen(DictionaryClient2-out, w); line 43: keyBufferLength += (labelLength+1); line 6: #include<stdio.h> line 26: |
Memory Management
Weve now done two labs in C and have a bit of experience doing manual memory management so youll notice that our freeDictionary() function is asking for something a little odd, a Dictionary* (remember that Dictionary is another name for DictionaryObj*). The same is true of freeEntry() wanting an Entry* rather than an Entry (another name for EntryObj*) . The reason for this is to allow the function that is responsible for freeing memory to also be able to set the pointer that is pointing to the freed memory to NULL. Put more simply, its an emulation of the way deleting something in Java works since in Java you dont have to worry about a reference pointing to deallocated memory.
What does this mean for you? If you look at either of the DictionaryClient.c files youll notice that their calls to freeDictionary() use &A. &A means were getting the address of A, which is of type Dictionary (really a DictionaryObj*). So when we say freeDictionary(&A), were passing it the pointer to a pointer, which is what freeDictionary() wants.
When were actually writing our destructor method, instead of simply freeing what we were passed as we have done before with List:
void free_node(Node* node){free(node->data);free(node); } |
as an example, we need to dereference the pointer to the pointer. In freeDictionary()s case pD is a pointer to a pointer to an instance of a DictionaryObj:
&pD->pD->instance of DictionaryObj
If all we have is &pD, how do we get to our instance? We can use the * operator to dereference &pD like this:
(*pD)
Once we have (*pD), we can treat it just like a regular pointer! But because we also have &pd (the pointer to the pointer), this allows us to set pD to NULL inside the freeDictionary() function.
Note: Remember that the List we built in Lab 4 uses a void * for data and since well be storing an Entry in a Node struct, well need to loop through the List calling freeEntry() on each Nodes data pointer. Once we finish freeing each Entry, then were allowed to call free_list().
Reminders About valgrind
Make sure youre running valgrind as follows for both DictionaryClient and DictionaryClient2:
valgrind leak-check=full -v ./DictionaryClient
valgrind leak-check=full -v ./DictionaryClient2
The important things to look out for (after segmentation faults) is that the number of allocations and the number of frees should be the same. Generally when these are equal you wont have any memory leaks, though the in use report at exit part of the HEAP SUMMARY will give you a more detailed breakdown.
Reviews
There are no reviews yet.