In this project, we are creating an Abstract Data Type (ADT) to improve upon the standard C way of representing strings. We will use malloc and free, and get some practice with pointers, structs, and macros.
On Strings
Strings in C are stored simply as an array (buffer) of ASCII-encoded characters with a zero (null character) on the end. This works well, but has some drawbacks. Importantly, strings are a common source of buffer overflow errors, a serious security vulnerability that happens when programmers forget how large the strings buffer actually is when writing to it.
Our new ADT will still store the actual string in the basically the same way as before however, it will also store some extra information as well specifically, the length (number of characters) and the capacity (how many characters can it hold before buffer overflow would result). As some amount of extra safety, in the four bytes immediately following the string, we will store a not-sorandom looking byte sequence that we can check to confirm that nothing bad has happened (yet).
On UTStrings
You are to implement each of the functions declared inside String.h. You must use (without modifying) the UTString struct that is defined inside String.h. This struct consists of the following parts:
- length The length of the string. This is the number of characters in the string, and does not include the null character, nor anything after it. Keep in mind that the length of a string may be shorter than the buffer in which the string is stored. For example, if we had the string hello, then the length would be 5 (one for each useful character), regardless of the size of the buffer.
- capacity The length of the longest string that can be stored in the buffer. For example, if we had a string with a buffer of length 20, then capacity would be 15 (20, minus the null character and check), regardless of the length of the string.
- string A pointer to the buffer where the string is stored. It must be allocated separately from the UTString itself.
- check The signature value (~0xdeadbeef) stored after a string. It is not a true member of the struct, but should be in it regardless. We will check the value every time when working with a UTString to make sure that no buffer overflow has occurred yet. If it does not check out correctly, your program should fail an assert and crash immediately. UTString struct:
length (int, 4 bytes) = 11 |
capacity (int, 4 bytes) = 11 or greater |
string (char*, 4 bytes) pointer to location of buffer on heap of at least capacity |
The contents of the string buffer for the string Hello World (the single quotes and 0x are not shown for brevity):
H | e | l | l | o | W | o | r | l | d |