Invented by | Bayer, McCreight (1972) | |
Idea: |
- a generalization of 2-3 and 2-3-4 trees: internal nodes of a B-tree of order t have between \(t/2\) and \(t\) children (for \(t=3\) and 4, we get 2-3 and 2-3-4 trees)
- B-trees are often used as an external data structure, i.e., for data stored on disk
- since hard disks are 1000 – 100 000-times slower than RAM, it is crucial to minimize the number of disk accesses; this is achieved by choosing very large \(t\) (usually so that one node fills one disk page)
- when a node overflows, it is split in two and the middle element is inserted to parent (a cascade of splits may occur)
- when a node underflows, it may borrow some elements from neighbouring nodes or, if the nodes are small, they may be merged
|
Advantages: |
- great for storing large amounts of data which don’t fit into the main memory, while it minimizes the number of disk accesses
- also great for RAM, due to cache efficiency; for example, the standard library tree implementation in Rust is a B-tree with \(B=6\)
|
Variations: |
- B+ tree: items are only stored at leaves; the keys in internal vertices only serve as guides during the search; there are two advantages: 1. since internal nodes do not store any additional data, this allows for higher degree, smaller height, and less disk accesses; 2. data may be easily traversed sequentially from left to right (contrast this with in-order traversal of a B-tree)
- pre-emptive splitting and merging: split/merge nodes on the way down the tree; advantage: no need to return (and thus less disk accesses); disadvantage: sometimes nodes are split even when it was not necessary (which leads to slightly increased height and memory consumption)
- B* tree: non-root nodes have to be at least 2/3 full; advantage: better use of memory; disadvantage: more splitting and merging is needed
|