Treap
Invented by | Aragon, Seidel (1989) | |
---|---|---|
Idea: |
| |
Advantages: | once the rotations are implemented, the implementation is very easy; deletion from a treap is even simpler than deletion from a BST | |
Disadvantages: | a minor one: the O(logn) time is “only” expected case with high probability |
Randomizing trees
It is well known (see section below) that if we insert elements into BST in random order, on average (and even with high probability), its height will be logarithmic. In other words, with high probability, it will be fine and we don’t even need to rebalance it. The problem of course is that we usually don’t have any control of the elements and insertion order.
Idea: Assign random priority to each node. Treap (a tree+heap) is a binary search tree by key (i.e., smaller keys in a left subtree, bigger keys in a right subtree) and is heap-ordered by priorities (i.e., parent has always higher priority than its children).
Here is an example of a treap (letters are keys, the small numbers are priorities):
A treap is ordered from left to right by key (A, B, C, F, J, L) and heap-ordered by priorities (e.g., 87 > 59 and 77; 77 > 42 and 10).
Uniqueness: It turns out that there is exactly one treap for any given set of (distinct) keys and their (distinct) priorities.
Proof: The root has to be the node with the highest priority (e.g., node C in the example above). Now split the remaining nodes: smaller than root will go left, bigger than root right. Build left and right subtrees (sub-treaps) recursively.
Insert: Same as in BST + bubble up to restore heap order.
Delete: Bubble down until it’s a leaf - then just cut the parent link.
Analysing height of a random tree
Imagine a simple BST where we add n elements in random order. What will be the height of such a tree?
It turns out that with high probability, it will be logarithmic.
Proof: Let Aik be a random variable such that Aik=1 if i is an ancestor of k and 0 otherwise. Let i≠k. Then i is ancestor of k if and only if i is the first value inserted from the interval [i,k]. (If j is between i and k and we insert it first, then i and k end up in different subtrees of j.)
Since each element from [i,k] has the same probability of going first, Pr[Aik=1]=E[Aik]=1/(|i−k|+1) (for i≠k). Then E[depth(k)]=E[∑i≠kAik]=∑i≠kE[Aik]=Hk+Hn−k+1−2≤2lnn where Hn is the n-the Harmonic number.
Moreover, we can prove, that say Pr[depth(k)>8lnn]<2/n2 (it is very improbable, that a node would end up with depth more than 8 times lnn). Consequently, Pr[tree height>8lnn] ≤Pr[any node has depth>8lnn] ≤n×Pr[fixed node has depth>8ln] ≤2/n. Let’s split depth(k)=∑i≠kAik into d<(k)=∑i<kAik and d>(k)=∑i>kAik. Since variables Aik for i<k are independent, we can use Chernoff bound to prove Pr[d<(k)>4lnn]<(e3/44)lnn=n3−4ln4<1/n2
(Aside: Using much more advanced mathematics, it has been proven that E[Hn]=αlnn−βlnln+O(1) where α=4.311⋯ and β=1.953⋯, see Reed (2003).)
References
- C.R. Aragon and R.G. Seidel. Randomized search trees. In Foundations of Computer Science, 1989, 30th Annual Symposium on, pages 540-545. IEEE, 1989. [bib] [pdf]
- C. Martínez and S. Roura. Randomized binary search trees. Journal of the ACM (JACM), 45(2):288-323, 1998. [bib] [pdf]
- B. Reed. The height of a random binary search tree. Journal of the ACM, 50(3):306-332, 2003. [bib] [pdf]