11.2 Hash tables
11.2-1
Suppose we use a hash function to hash distinct keys into an array of length . Assuming simple uniform hashing, what is the expected number of collisions? More precisely, what is the expected cardinality of ?
Under the assumption of simple uniform hashing, we will use linearity of expectation to compute this.
Suppose that all the keys are totally ordered . Let be the number of 's such that and . So is the (expected) number of times that key is collided by those keys hashed afterward. Note, that this is the same thing as . Then, by linearity of expectation, the number of collisions is the sum of the number of collisions for each possible smallest element in the collision. The expected number of collisions is
11.2-2
Demonstrate what happens when we insert the keys into a hash table with collisions resolved by chaining. Let the table have slots, and let the hash function be .
Let us number our slots .
Then our resulting hash table will look like following:
11.2-3
Professor Marley hypothesizes that he can obtain substantial performance gains by modifying the chaining scheme to keep each list in sorted order. How does the professor's modification affect the running time for successful searches, unsuccessful searches, insertions, and deletions?
- Successful searches: no difference, .
- Unsuccessful searches: faster but still .
- Insertions: same as successful searches, .
- Deletions: same as before if we use doubly linked lists, .
11.2-4
Suggest how to allocate and deallocate storage for elements within the hash table itself by linking all unused slots into a free list. Assume that one slot can store a flag and either one element plus a pointer or two pointers. All dictionary and free-list operations should run in expected time. Does the free list need to be doubly linked, or does a singly linked free list suffice?
The flag in each slot of the hash table will be if the element contains a value, and if it is free. The free list must be doubly linked.
- Search is unmodified, so it has expected time .
- To insert an element , first check if is free. If it is, delete and change the flag of to . If it wasn't free to begin with, simply insert at the start of the list stored there.
- To delete, first check if and are . If they are, then the list will be empty upon deletion of , so insert into the free list, update the flag of to , and delete from the list it's stored in. Since deletion of an element from a singly linked list isn't , we must use a doubly linked list.
- All other operations are .
11.2-5
Suppose that we are storing a set of keys into a hash table of size . Show that if the keys are drawn from a universe with , then has a subset of size consisting of keys that all hash to the same slot, so that the worst-case searching time for hashing with chaining is .
Suppose the slots contains at most elements, then the remaining slot should have
elements, thus has a subset of size .
11.2-6
Suppose we have stored keys in a hash table of size , with collisions resolved by chaining, and that we know the length of each chain, including the length of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time .
Choose one of the spots in the hash table at random. Let denote the number of elements stored at . Next pick a number from to uniformly at random. If , then return the th element on the list. Otherwise, repeat this process. Any element in the hash table will be selected with probability , so we return any key with equal probability. Let be the random variable which counts the number of times we must repeat this process before we stop and be the probability that we return on a given attempt. Then since we'd expect to take steps to reach an element on the list, and since we know how many elements are on each list, if the element doesn't exist we'll know right away. Then we have . The probability of picking a particular element is . Therefore, we have
since .