Suffix tree

Idea:	store all suffixes of a string in a (compressed) trie
Advantages:	a very strong hammer for solving almost any stringology problem string matching: after a linear time precomputation, we can search pattern $p$ in $O(\|p\|)$ time
Disadvantages:	very space-consuming; using a suffix array may be a better alternative

[You can download the application for offline viewing.]

Structure

Suffix tree is a compressed trie containing all the suffixes of a given string $S$. For example, MISSISSIPPI$ has 12 (nonempty) suffixes: $, I$, PI$, PPI$, IPPI$, … ISSISSIPPI$, MISSISSIPPI$. If we store all these words in a trie, we get:

This is our conceptual view of a suffix tree (not a real suffix tree). The problem is that such a trie would have $\Theta(n^2)$ nodes – it would take $\Theta(n^2)$ space and couldn’t be constructed in time better than $\Omega(n^2)$. This is too much to be of any practical use.

We show, however, that it is possible to represent such a trie in $O(n)$ memory and construct it in $O(n)$ time.

We merge each node that is the only child with its parent:

This way, each internal node is at least binary and therefore there are only $O(n)$ nodes. (The number of internal nodes is less than the number of leaves, which is $n$ – the number of suffixes and the length of the string.) However, the edge labels still consist of $\Theta(n^2)$ symbols in total.

What shall we do? Instead of making copies of different substrings as edge labels, we just use indices into string $S$. For example, instead of label SSI, which is the substring $S[3\ldots 5)$, we just store the indices $[3,5)$. Instead of label SSIPPI$$=S[5,12)$, we store just $[5,12)$.

This way, each edge takes $O(1)$ space and there are $O(n)$ nodes and edges, so the trie of all suffixes can be represented in $O(n)$ memory. Each leaf corresponds to a suffix and is labeled with the position where the suffix starts.

References

E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995. [bib] [pdf]
P. Weiner. Linear pattern matching algorithms. In IEEE 14th Ann. Symp. on Switching and Automata Theory, 1973, pages 1-11. [bib] [pdf]
E.M. McCreight. A space-economical suffix tree construction algorithm. Journal of the ACM (JACM), 23(2):262-272, 1976. [bib] [pdf]
R. Giegerich and S. Kurtz. From Ukkonen to McCreight and Weiner: A unifying view of linear-time suffix tree construction. Algorithmica, 19(3):331-353, 1997. [bib] [pdf]