20-2 yy-fast tries

This problem investigates D. Willard's "yy-fast tries" which, like van Emde Boas trees, perform each of the operations MEMBER\text{MEMBER}, MINIMUM\text{MINIMUM}, MAXIMUM\text{MAXIMUM}, PREDECESSOR\text{PREDECESSOR}, and SUCCESSOR\text{SUCCESSOR} on elements drawn from a universe with size uu in O(lglgu)O(\lg\lg u) worst-case time. The INSERT\text{INSERT} and DELETE\text{DELETE} operations take O(lglgu)O(\lg\lg u) amortized time. Like reduced-space van Emde Boas trees (see Problem 20-1), yy-fast tries use only O(n)O(n) space to store nn elements. The design of yy-fast tries relies on perfect hashing (see Section 11.5).

As a preliminary structure, suppose that we create a perfect hash table containing not only every element in the dynamic set, but every prefix of the binary representation of every element in the set. For example, if u=16u = 16, so that lgu=4\lg u = 4, and x=13x = 13 is in the set, then because the binary representation of 1313 is 11011101, the perfect hash table would contain the strings 11, 1111, 110110, and 11011101. In addition to the hash table, we create a doubly linked list of the elements currently in the set, in increasing order.

a. How much space does this structure require?

b. Show how to perform the MINIMUM\text{MINIMUM} and MAXIMUM\text{MAXIMUM} operations in O(1)O(1) time; the MEMBER\text{MEMBER}, PREDECESSOR\text{PREDECESSOR}, and SUCCESSOR\text{SUCCESSOR} operations in O(lglgu)O(\lg\lg u) time; and the INSERT\text{INSERT} and DELETE\text{DELETE} operations in O(lgu)O(\lg u) time.

To reduce the space requirement to O(n)O(n), we make the following changes to the data structure:

  • We cluster the nn elements into n/lgun / \lg u groups of size lgu\lg u. (Assume for now that lgu\lg u divides nn.) The first group consists of the lgu\lg u smallest elements in the set, the second group consists of the next lgu\lg u smallest elements, and so on.
  • We designate a "representative" value for each group. The representative of the iith group is at least as large as the largest element in the iith group, and it is smaller than every element of the (i+1)(i + 1)st group. (The representative of the last group can be the maximum possible element u1u - 1.) Note that a representative might be a value not currently in the set.
  • We store the lgu\lg u elements of each group in a balanced binary search tree, such as a red-black tree. Each representative points to the balanced binary search tree for its group, and each balanced binary search tree points to its group's representative.

The perfect hash table stores only the representatives, which are also stored in a doubly linked list in increasing order.

We call this structure a yy-fast trie.

c. Show that a yy-fast trie requires only O(n)O(n) space to store nn elements.

d. Show how to perform the MINIMUM\text{MINIMUM} and MAXIMUM\text{MAXIMUM} operations in O(lglgu)O(\lg\lg u) time with a yy-fast trie.

e. Show how to perform the MEMBER\text{MEMBER} operation in O(lglgu)O(\lg\lg u) time.

f. Show how to perform the PREDECESSOR\text{PREDECESSOR} and SUCCESSOR\text{SUCCESSOR} operations in O(lglgu)O(\lg\lg u) time.

g. Explain why the INSERT\text{INSERT} and DELETE\text{DELETE} operations take Ω(lglgu)\Omega(\lg\lg u) time.

h. Show how to relax the requirement that each group in a yy-fast trie has exactly lgu\lg u elements to allow INSERT\text{INSERT} and DELETE\text{DELETE} to run in O(lglgu)O(\lg\lg u) amortized time without affecting the asymptotic running times of the other operations.

a. By 11.5, the perfect hash table uses O(m)O(m) space to store m elements. In a universe of size uu, each element contributes lgu\lg u entries to the hash table, so the requirement is O(nlgu)O(n\lg u). Since the linked list requires O(n)O(n), the total space requirement is O(nlgu)O(n\lg u).

b. MINIMUM\text{MINIMUM} and MAXIMUM\text{MAXIMUM} are easy. We just examine the first and last elements of the associated doubly linked list. MEMBER\text{MEMBER} can actually be performed in O(1)O(1), since we are simply checking membership in a perfect hash table. PREDECESSOR\text{PREDECESSOR} and SUCCESSOR\text{SUCCESSOR} are a bit more complicated.

Assume that we have a binary tree in which we store all the elements and their prefixes. When we query the hash table for an element, we get a pointer to that element's location in the binary search tree, if the element is in the tree, and NIL\text{NIL} otherwise. Moreover, assume that every leaf node comes with a pointer to its position in the doubly linked list. Let xx be the number whose successor we seek. Begin by performing a binary search of the prefixes in the hash table to find the longest hashed prefix yy which matches a prefix of xx. This takes O(lglgu)O(\lg\lg u) since we can check if any prefix is in the hash table in O(1)O(1).

Observe that yy can have at most one child in the BST, because if it had both children then one of these would share a longer prefix with xx. If the left child is missing, have the left child pointer point to the largest labeled leaf node in the BST which is less than yy. If the right child is missing, use its pointer to point to the successor of yy. If yy is a leaf node then y=xy = x, so we simply follow the pointer to xx in the doubly linked list, in O(1)O(1), and its successor is the next element on the list. If yy is not a leaf node, we follow its predecessor or successor node, depending on which we need. This gives us O(1)O(1) access to the proper element, so the total runtime is O(lglgu)O(\lg\lg u). INSERT\text{INSERT} and DELETE\text{DELETE} must take O(lgu)O(\lg u) since we need to insert one entry into the hash table for each of their bits and update the pointers.

c. The doubly linked list has less than nn elements, while the binary search trees contains nn nodes, thus a yy-fast trie requires O(n)O(n) space.

d. MINIMUM\text{MINIMUM}: Find the minimum representative in the doubly linked list in Θ(1)\Theta(1), then find the minimum element in the binary search tree in O(lglgu)O(\lg\lg u).

e. Find the smallest representative greater than kk with binary searching in Θ(lglgu)\Theta(\lg\lg u), find the element in the binary search tree in O(lglgu)O(\lg\lg u).

f. If we can find the largest representative greater than or equal to xx, we can determine which binary tree contains the predecessor or successor of xx. To do this, just call PREDECESSOR\text{PREDECESSOR} or SUCCESSOR\text{SUCCESSOR} on xx to locate the appropriate tree in O(lglgu)O(\lg\lg u). Since the tree has height lgu\lg u, we can find the predecessor or successor in O(lglgu)O(\lg\lg u).

g. Same as e, we need to find the cluster in Θ(lglgu)\Theta(\lg\lg u), then the operations in the binary search tree takes O(lglgu)O(\lg\lg u).

h. We can relax the requirements and only impose the condition that each group has at least 12lgu\frac{1}{2}\lg u elements and at most 2lgu2\lg u elements.

  • If a red-black tree is too big, we split it in half at the median.
  • If a red-black tree is too small, we merge it with a neighboring tree.
  • If this causes the merged tree to become too large, we split it at the median.
  • If a tree splits, we create a new representative.
  • If two trees merge, we delete the lost representative.

Any split or merge takes O(lgu)O(\lg u) since we have to insert or delete an element in the data structure storing our representatives, which by part (b) takes O(lgu)O(\lg u).

However, we only split a tree after at least lgu\lg u insertions, since the size of one of the red-black trees needs to increase from lgu\lg u to 2lgu2\lg u and we only merge two trees after at least (1/2)lgu(1 / 2)\lg u deletions, because the size of the merging tree needs to have decreased from lgu\lg u to (1/2)lgu(1 / 2)\lg u. Thus, the amortized cost of the merges, splits, and updates to representatives is O(1)O(1) per insertion or deletion, so the amortized cost is O(lglgu)O(\lg\lg u) as desired.