Next: BTree Cursors, Previous: Persistent Sets, Up: User Guide
A BTree is a data structure designed for on-disk databases. It's design goal is to minimize the number of disk seeks while traversing the tree structure. In contrast to a binary tree, the BTree exploits the properties of memory/disk data heirarchies. Disk seeks are expensive while loading large blocks of data is relatively inexpensive and in-memory scanning of a block of memory is much cheaper than a disk seek. This means a few, large nodes containing many keys is a more balanced data structure than
The BTree, or derivatives, are the basis of most record-oriented database including SQL servers, Berkeley DB and many others. Elephant directly exposes the BTree structure to the user so the user can decide how best to manage and traverse it. Many of Elephant's other facilities, such as the class indexing discussed above, are implemented on top of the BTree.
The basic interface to the BTree is via the get-value
method.
Both the key and the value are serialized and then the BTree is
traversed according to the sorted order of the key and the value
inserted in its sorted order. Insertion, access and deletion (via
remove-kv
) are all O(log N) complexity operations.
Sorting in BTrees requires some discussion. The sorting constraints on btrees are dictated by the original implementation on Berkeley DB. The Berkeley DB data store sorts keys based on their serialized representation. The CLSQL implementation has to sort based on the deserialized lisp value, so sorted traversals require reading all the objects into memory. This places some limitations on systems that exploit the CLSQL implementation (see CLSQL Data Store for more information).
Sorting is done first by primitive type (string, standard-class, array, etc) and then by value within that type. The type order and internal sorting constraint is:
String comparisons are case insensitive today, so "Adam" =
"adam" > "Steve"
. When unicode support is finalized, comparisons
will be case sensitive.
Like persistent sets, BTrees are not garbage collected so to recover
the storage of a BTree, just run the function drop-btree
to
delete all the key-value pairs and return their storage to the
database for reuse. The oid used by the btree, however, will not be
recovered.