Skip to content

20.3 The van Emde Boas tree

20.3-1

Modify vEB trees to support duplicate keys.

To support duplicate keys, for each u=2u = 2 vEB tree, instead of storing just a bit in each of the entries of its array, it should store an integer representing how many elements of that value the vEB contains.

20.3-2

Modify vEB trees to support keys that have associated satellite data.

For any key which is a minimum on some vEB, we'll need to store its satellite data with the min value since the key doesn't appear in the subtree. The rest of the satellite data will be stored alongside the keys of the vEB trees of size 22. Explicitly, for each non-summary vEB tree, store a pointer in addition to min. If min is NIL\text{NIL}, the pointer should also point to NIL\text{NIL}. Otherwise, the pointer should point to the satellite data associated with that minimum. In a size 22 vEB tree, we'll have two additional pointers, which will each point to the minimum's and maximum's satellite data, or NIL\text{NIL} if these don't exist. In the case where min=max\min = \max, the pointers will point to the same data.

20.3-3

Write pseudocode for a procedure that creates an empty van Emde Boas tree.

We define the procedure for any uu that is a power of 22. If u=2u = 2, then, just slap that fact together with an array of length 22 that contains 00 in both entries.

If u=2k>2u = 2k > 2, then, we create an empty vEB tree called Summary with u=2k/2u = 2^{\lceil k / 2 \rceil}. We also make an array called cluster of length 2k/22^{\lceil k / 2 \rceil} with each entry initialized to an empty vEB tree with u=2k/2u = 2^{\lfloor k / 2 \rfloor}. Lastly, we create a min and max element, both initialized to NIL\text{NIL}.

20.3-4

What happens if you call VEB-TREE-INSERT\text{VEB-TREE-INSERT} with an element that is already in the vEB tree? What happens if you call VEB-TREE-DELETE\text{VEB-TREE-DELETE} with an element that is not in the vEB tree? Explain why the procedures exhibit the behavior that they do. Show how to modify vEB trees and their operations so that we can check in constant time whether an element is present.

Suppose that xx is already in VV and we call INSERT\text{INSERT}. Then we can't satisfy lines 1, 3, 6, or 10, so we will enter the else case on line 9 every time until we reach the base case. If xx is already in the base-case tree, then we won't change anything. If xx is stored in a min attribute of a vEB tree that isn't base-case, however, we will insert a duplicate of it in some base-case tree. Now suppose we call DELETE\text{DELETE} when xx isn't in VV . If there is only a single element in VV, lines 1 through 3 will delete it, regardless of what element it is. To enter the elseif of line 4, xx can't be equal to 00 or 11 and the vEB tree must be of size 22. In this case, we delete the max element, regardless of what it is. Since the recursive call always puts us in this case, we always delete an element we shouldn't. To avoid these issue, keep and updated auxiliary array AA with uu elements. Set A[i]=0A[i] = 0 if ii is not in the tree, and 11 if it is. Since we can perform constant time updates to this array, it won't affect the runtime of any of our operations. When inserting xx, check first to be sure A[x]=0A[x] = 0. If it's not, simply return. If it is, set A[x]=1A[x] = 1 and proceed with insert as usual. When deleting xx, check if A[x]=1A[x] = 1. If it isn't, simply return. If it is, set A[x]=0A[x] = 0 and proceed with delete as usual.

20.3-5

Suppose that instead of u\sqrt[\uparrow]u clusters, each with universe size u\sqrt[\downarrow]u, we constructed vEB trees to have u1/ku^{1 / k} clusters, each with universe size u11/ku^{1 - 1 / k}, where k>1k > 1 is a constant. If we were to modify the operations appropriately, what would be their running times? For the purpose of analysis, assume that u1/ku^{1 / k} and u11/ku^{1 - 1 / k} are always integers.

Similar to the analysis of (20.4)\text{(20.4)}, we will analyze

T(u)T(u11/k)+T(u1/k)+O(1).T(u) \le T(u^{1 - 1 / k}) + T(u^{1 / k}) + O(1).

This is a good choice for analysis because for many operations we first check the summary vEB tree, which will have size u1/ku^{1 / k} (the second term). And then possible have to check a vEB tree somewhere in cluster, which will have size u11/ku^{1 - 1/k} (the first term). We let T(2m)=S(m)T(2^m) = S(m), so the equation becomes

S(m)S(m(11/k))+S(m/k)+O(1).S(m) \le S(m(1 - 1/k)) + S(m/k) + O(1).

If k>2k > 2 the first term dominates, so by master theorem, we'll have that S(m)S(m) is O(lgm)O(\lg m), this means that T will be O(lg(lgu))O(\lg(\lg u)) just as in the original case where we took squareroots.

20.3-6

Creating a vEB tree with universe size uu requires O(u)O(u) time. Suppose we wish to explicitly account for that time. What is the smallest number of operations nn for which the amortized time of each operation in a vEB tree is O(lglgu)O(\lg\lg u)?

Set n=u/lglgun = u / \lg\lg u. Then performing nn operations takes c(u+nlglgu)c(u + n\lg\lg u) time for some constant cc. Using the aggregate amortized analysis, we divide by nn to see that the amortized cost of each operations is c(lglgu+lglgu)=O(lglgu)c(\lg\lg u + \lg\lg u) = O(\lg\lg u) per operation. Thus we need nu/lglgun \ge u/ \lg \lg u.