Skip to content

11.2 Hash tables

11.2-1

Suppose we use a hash function hh to hash nn distinct keys into an array TT of length mm. Assuming simple uniform hashing, what is the expected number of collisions? More precisely, what is the expected cardinality of {{k,l}:kl and h(k)=h(l)}\{\{k, l\}: k \ne l \text{ and } h(k) = h(l)\}?

Under the assumption of simple uniform hashing, we will use linearity of expectation to compute this.

Suppose that all the keys are totally ordered {k1,,kn}\{k_1, \dots, k_n\}. Let XiX_i be the number of \ell's such that >ki\ell > k_i and h()=h(ki)h(\ell) = h(k_i). So XiX_i is the (expected) number of times that key kik_i is collided by those keys hashed afterward. Note, that this is the same thing as j>iPr(h(kj)=h(ki))=j>i1/m=(ni)/m\sum_{j > i} \Pr(h(k_j) = h(k_i)) = \sum_{j > i} 1 / m = (n - i) / m. Then, by linearity of expectation, the number of collisions is the sum of the number of collisions for each possible smallest element in the collision. The expected number of collisions is

i=1nnim=n2n(n+1)/2m=n2n2m.\sum_{i = 1}^n \frac{n - i}{m} = \frac{n^2 - n(n + 1) / 2}{m} = \frac{n^2 - n}{2m}.

11.2-2

Demonstrate what happens when we insert the keys 5,28,19,15,20,33,12,17,105, 28, 19, 15, 20, 33, 12, 17, 10 into a hash table with collisions resolved by chaining. Let the table have 99 slots, and let the hash function be h(k)=kmod  9h(k) = k \mod 9.

Let us number our slots 0,1,,80, 1, \dots, 8.

Then our resulting hash table will look like following:

h(k)keys0mod  91mod  91019282mod  9203mod  9124mod  95mod  956mod  933157mod  98mod  917 \begin{array}{c|l} h(k) & \text{keys} \\ \hline 0 \mod 9 & \\ 1 \mod 9 & 10 \to 19 \to 28 \\ 2 \mod 9 & 20 \\ 3 \mod 9 & 12 \\ 4 \mod 9 & \\ 5 \mod 9 & 5 \\ 6 \mod 9 & 33 \to 15 \\ 7 \mod 9 & \\ 8 \mod 9 & 17 \end{array}

11.2-3

Professor Marley hypothesizes that he can obtain substantial performance gains by modifying the chaining scheme to keep each list in sorted order. How does the professor's modification affect the running time for successful searches, unsuccessful searches, insertions, and deletions?

  • Successful searches: no difference, Θ(1+α)\Theta(1 + \alpha).
  • Unsuccessful searches: faster but still Θ(1+α)\Theta(1 + \alpha).
  • Insertions: same as successful searches, Θ(1+α)\Theta(1 + \alpha).
  • Deletions: same as before if we use doubly linked lists, Θ(1)\Theta(1).

11.2-4

Suggest how to allocate and deallocate storage for elements within the hash table itself by linking all unused slots into a free list. Assume that one slot can store a flag and either one element plus a pointer or two pointers. All dictionary and free-list operations should run in O(1)O(1) expected time. Does the free list need to be doubly linked, or does a singly linked free list suffice?

The flag in each slot of the hash table will be 11 if the element contains a value, and 00 if it is free. The free list must be doubly linked.

  • Search is unmodified, so it has expected time O(1)O(1).
  • To insert an element xx, first check if T[h(x.key)]T[h(x.key)] is free. If it is, delete T[h(x.key)]T[h(x.key)] and change the flag of T[h(x.key)]T[h(x.key)] to 11. If it wasn't free to begin with, simply insert x.keyx.key at the start of the list stored there.
  • To delete, first check if x.prevx.prev and x.nextx.next are NIL\text{NIL}. If they are, then the list will be empty upon deletion of xx, so insert T[h(x.key)]T[h(x.key)] into the free list, update the flag of T[h(x.key)]T[h(x.key)] to 00, and delete xx from the list it's stored in. Since deletion of an element from a singly linked list isn't O(1)O(1), we must use a doubly linked list.
  • All other operations are O(1)O(1).

11.2-5

Suppose that we are storing a set of nn keys into a hash table of size mm. Show that if the keys are drawn from a universe UU with U>nm|U| > nm, then UU has a subset of size nn consisting of keys that all hash to the same slot, so that the worst-case searching time for hashing with chaining is Θ(n)\Theta(n).

Suppose the m1m - 1 slots contains at most n1n - 1 elements, then the remaining slot should have

U(m1)(n1)>nm(m1)(n1)=n+m1n|U| - (m - 1)(n - 1) > nm - (m - 1)(n - 1) = n + m - 1 \ge n

elements, thus UU has a subset of size nn.

11.2-6

Suppose we have stored nn keys in a hash table of size mm, with collisions resolved by chaining, and that we know the length of each chain, including the length LL of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time O(L(1+1/α))O(L \cdot (1 + 1 / \alpha)).

Choose one of the mm spots in the hash table at random. Let nkn_k denote the number of elements stored at T[k]T[k]. Next pick a number xx from 11 to LL uniformly at random. If x<njx < n_j, then return the xxth element on the list. Otherwise, repeat this process. Any element in the hash table will be selected with probability 1/mL1 / mL, so we return any key with equal probability. Let XX be the random variable which counts the number of times we must repeat this process before we stop and pp be the probability that we return on a given attempt. Then E[X]=p(1+α)+(1p)(1+E[X])E[X] = p(1 + \alpha) + (1 − p)(1 + E[X]) since we'd expect to take 1+α1 + \alpha steps to reach an element on the list, and since we know how many elements are on each list, if the element doesn't exist we'll know right away. Then we have E[X]=α+1/pE[X] = \alpha + 1 / p. The probability of picking a particular element is n/mL=α/Ln / mL = \alpha / L. Therefore, we have

E[X]=α+L/α=L(α/L+1/α)=O(L(1+1/α)) \begin{aligned} E[X] & = \alpha + L / \alpha \\ & = L(\alpha/L + 1/\alpha) \\ & = O(L(1 + 1/\alpha)) \end{aligned}

since αL\alpha \le L.