- (15 pts) Bellatrix Lestrange is writing a secret message to Voldemort and wants to prevent it from being understood by meddlesome young wizards and Muggles. She decides to use Huffman encoding to encode the message. Magically, the symbol frequencies of the message are given by the Pell numbers, a famous sequence of integers known since antiquity and related to the Fibonacci numbers. The nth Pell number is defined as Pn = 2Pn1 + Pn2 for n > 1 with base cases P0 = 0 and P1 = 1.
- For an alphabet of = {a,b,c,d,e,f,g,h} with frequencies given by the first || non-zero Pell numbers, give an optimal Huffman code and the corresponding encoding tree for Bellatrix to use.
- Generalize your answer to (1a) and give the structure of an optimal code whenthe frequencies are the first n non-zero Pell numbers.
- (30 pts) A good hash function h(x) behaves in practice very close to the uniform hashing assumption analyzed in class, but is a deterministic function. That is, h(x) = k each time x is used as an argument to h(). Designing good hash functions is hard, and a bad hash function can cause a hash table to quickly exit the sparse loading regime by overloading some buckets and under loading others. Good hash functions often rely on beautiful and complicated insights from number theory, and have deep connections to pseudorandom number generators and cryptographic functions. In practice, most hash functions are moderate to poor approximations of uniform hashing.
Consider the following hash function. Let U be the universe of strings composed of the characters from the alphabet = [A,,Z], and let the function f(xi) return the index of a letter xi , e.g., f(A) = 1 and f(Z) = 26. Finally, for an m-character string x m, define ), where ` is the number of buckets in the hash table. That is, our hash function sums up the index values of the characters of a string x and maps that value onto one of the ` buckets.
- The following list contains US Census derived last names:
http://www2.census.gov/topics/genealogy/1990surnames/dist.all.last Using these names as input strings, first choose a uniformly random 50% of these name strings and then hash them using h(x).
Produce a histogram showing the corresponding distribution of hash locations when ` = 200. Label the axes of your figure. Briefly describe what the figure shows about h(x), and justify your results in terms of the behavior of h(x). Do not forget to append your code.
Hint: the raw file includes information other than name strings, which will need to be removed; and, think about how you can count hash locations without building or using a real hash table.
1
- Enumerate at least 4 reasons why h(x) is a bad hash function relative to the ideal behavior of uniform hashing.
- Produce a plot showing (i) the length of the longest chain (were we to use chainingfor resolving collisions under h(x)) as a function of the number n of these strings that we hash into a table with ` = 200 buckets, (ii) the exact upper bound on the depth of a red-black tree with n items stored, and (iii) the length of the longest chain were we to use a uniform hash instead of h(x). Include a guide of cn
Then, comment (i) on how much shorter the longest chain would be under a uniform hash than under h(x), and (ii) on the value of n at which the red-black tree becomes a more efficient data structure than h(x) and separately a uniform hash.
- (15 pts) Draco Malfoy is struggling with the problem of making change for n cents using the smallest number of coins. Malfoy has coin values of v1 < v2 < < vr for r coins types, where each coins value vi is a positive integer. His goal is to obtain a set of counts {di}, one for each coin type, such that and where k is minimized.
(a) A greedy algorithm for making change is the cashiers algorithm, which all young wizards learn. Malfoy writes the following pseudocode on the whiteboard to illustrate it, where n is the amount of money to make change for and v is a
vector of the coin denominations:
wizardChange(n,v,r) :
d[1 .. r] = 0 // initial histogram of coin types in solution while n > 0 {
k = 1 while ( k < r and v[k] > n ) { k++ } if k==r { return no solution } else { n = n v[k] }
} return d
Hermione snorts and says Malfoys code has bugs. Identify the bugs and explain why each would cause the algorithm to fail.
- Sometimes the goblins at Gringotts Wizarding Bank run out of coins,[1] and make change using whatever is left on hand. Identify a set of U.S. coin denominations for which the greedy algorithm does not yield an optimal solution. Justify your
2
answer in terms of optimal substructure and the greedy-choice property. (The set should include a penny so that there is a solution for every value of n.)
- (8 pts extra credit) On the advice of computer scientists, Gringotts has announcedthat they will be changing all wizard coin denominations into a new set of coins denominated in powers of c, i.e., denominations of c0,c1,,c` for some integers c > 1 and ` 1. (This will be done by a spell that will magically transmute old coins into new coins, before your very eyes.) Prove that the cashiers algorithm will always yield an optimal solution in this case.
Hint: first consider the special case of c = 2.
3
[1] Its a little known secret, but goblins like to eat the coins. It isnt pretty for the coins, in the end.
Reviews
There are no reviews yet.