Hash Functions
Last time we discussed two ways to store tables:
Direct (very memory inefficient) Hash table (2D, list of collisions)
Open addressing keeps things in one dimension (so dont need to grow/shrink lists) but has key limit
Open addressing
Much like direct addressing, we have a simple 1D array (starts empty):
Much like hash tables, when putting
data in we use a hash function:
234
insert 234 h(234) = 4
012345
Open addressing
If this is just a 1D array, where would you put a collision?
012345 insert 584
h(584) = 4
Where do you put the data 584?
234
Open addressing
Probably the most straight forward answer is: the next open spot
012345 insert 584
h(584) = 4
To do this we use a modified hash: g(k, i) = h(k) + i (mod m)
234
584
Open addressing
234
584
012345 insert 584
h(584) = 4
First try to put this data in: g(k,0)
If this is already occupied try: g(k,1) Next try: g(k, 2) and so on until you find an open spot
Open addressing
Using the next open spot what issues happen over the long term?
012345 insert 584
h(584) = 4
234
584
Open addressing
This tends to create long consecutive filled entries; 3 compounding issues:
If a key collides, it extends the length of filled consecutive filled Longer consecutive filled means more runtime to find an open spot More keys in array means more chance of collision on next added
Open addressing
You might be tempted to space out these keys by using:
g(k, i) = h(k) + 2i (mod m)
g(584,1) g(584,0)
584
234
012345 insert 584
h(584) = 4
(skip)
Open addressing
This actually makes the problem worse consider adding more keys:
g(74,0) g(74,1) g(74,2) g(74,3)
012345 insert 74
h(74) = 4
We g(k,i) cant insert key 74 despite
having open space
584
8884
234
Open addressing
This issue comes as the size of our array (6) and our space out (2) are not relatively prime
Relatively prime: if greatest common divisor (GCD) is one
6 divisors = {1, 2, 3, 6} 2 divisors = {1, 2}
GCD(6,2) = 2
2 1, so (6,2)
not relatively prime
Open addressing
One easy way to fix this is to have m a prime number and the space out anything smaller than m
This still doesnt fix the issues that the space out does not fix our consecutive filled issue
Open addressing
This can be fixed by changing the space out to be different for every input:
g(k, i) = h(k) + h2(k)*i (mod m)
Just need to ensure the possible outputs of h2() is less than m
Open addressing
Set m=7 (array size = prime number)
h(k) = k %10, h (k) = k%4+1 2
need
space out
to be at least 1 or g(k,0)=g(k,1)
6
g(10,0)
g(10,1)
584
8884
234
012345
insert(10) h(10) = 0 h2(10) = 3
Open addressing
Set m=7 (array size = prime number)
h(k) = k %10, h (k) = k%4+1 2
need
space out
to be at least 1 or g(k,0)=g(k,1)
6
g(10,0)
g(10,1)
584
8884
10
234
012345
insert(10) h(10) = 0 h2(10) = 3
Open addressing
Set m=7 (array size = prime number)
h(k) = k %10, h (k) = k%4+1 g(20,0) 2
need
space out
to be at least 1 or g(k,0)=g(k,1)
6
g(10,0) g(20,1)
g(10,1)
584
8884
10
234
012345
insert(10) h(10) = 0 h2(10) = 3
insert(20) h(20) = 0 h2(20) = 1
Open addressing
Set m=7 (array size = prime number)
h(k) = k %10, h (k) = k%4+1 g(20,0) 2
need
space out
to be at least 1 or g(k,0)=g(k,1)
6
g(10,0) g(20,1)
g(10,1)
584
20
8884
10
234
012345
insert(10) h(10) = 0 h2(10) = 3
insert(20) h(20) = 0 h2(20) = 1
Open addressing