Next: , Previous: The Store Root, Up: Tutorial


2.4 Serialization

What can you put into the store besides strings? Almost all lisp values and objects can be stored: numbers, symbols, strings, nil, characters, pathnames, conses, hash-tables, arrays, CLOS objects and structs. Nested and circular things are allowed. Nested and circular things are allowed. You can store basically anything except compiled functions, closures, class objects, packages and streams. Functions can be stored as uncompiled lambda expressions. (Compiled functions and other kinds of objects may eventually get supported too.)

Elephant needs to use a representation of data that is independant of a specific lisp or data store. Therefore all lisp values that are stored must be serialized into a canonical format. Because Berkeley DB supports variable length binary buffers, Elephant uses a binary serialization system. This process has some important consequences that it is very important to understand:

  1. Lisp identity can't be preserved. Since this is a store which persists across invocations of Lisp, this probably doesn't even make sense. However if you get an object from the index, store it to a lisp variable, then get it again - they will not be eq:
              (setq foo (cons nil nil))
              => (NIL)
              (add-to-root "my key" foo)
              => (NIL)
              (add-to-root "my other key" foo)
              => (NIL)
              (eq (get-from-root "my key")
                    (get-from-root "my other key"))
              => NIL
         
  2. Nested aggregates are stored in one buffer. If you store an set of objects in a hash table you try to store a hash table, all of those objects will get stored in one large binary buffer with the hash keys. This is true for all other aggregates that can store type T (cons, array, standard object, etc).
  3. Mutated substructure does not persist.
              (setf (car foo) T)
              => T
              (get-from-root "my key")
              => (NIL)
         

    This will affect all aggregate types: objects, conses, hash-tables, et cetera. (You can of course manually re-store the cons.) In this sense elephant does not automatically provide persistent collections. If you want to persist every access, you have to use BTrees (see Persistent BTrees).

  4. Serialization and deserialization can be costly. While serialization is pretty fast, but it is still expensive to store large objects wholesale. Also, since object identity is impossible to maintain, deserialization must re-cons or re-allocate the entire object every time increasing the number of GCs the system does. This eager allocation is contrary to how most people want to use a database: one of the reasons to use a database is if your objects can't fit into main memory all at once.
  5. Merge-conflicts in heavily multi-process/threaded situations. This is the common read-modify-write problem in all databases. We will talk more about this in the Using Transactions section.

This may seem terribly restrictive, but don't despair, we'll solve most of these problems in the next section.....