Elephant Architecture - Elephant User Manual

Next: Data Store API Reference, Previous: Design Patterns, Up: Top

7 Elephant Architecture

Elephant's early architecture was tightly coupled to the Berkeley DB API. Over time we've moved towards a more modular architecture to support easy upgrading, repository migration, shared functionality between data stores and general hygene.

The architecture has been carefully modularized:

To get a feeling for what is happening inside elephant, it is probably best to walk through the various major protocols to see how these components participate in implementing them.

Initialization of a store controller
Creating a persistent object
Operations on persistent slots
Operations on persistent collections
Implementing with-transaction

7.1 Initializing a store controller

When the main elephant open-store function is called with a specification, it calls get-controller which first checks to see if a controller already exists for that spec.

If there is no controller, it calls build-controller to construct one. If the data store code base is not present, load-data-store is called to ensure that any asdf dependencies are satisfied. The associations for asdf dependencies are statically configured in *elephant-data-stores* for each data store type supported by elephant.

While being loaded, the data store is responsible for calling register-data-store-con-init to register a data store initialization function for its spec type (i.e. :BDB or :CLSQL). For example, from bdb-controller.lisp:

     (eval-when (:compile-toplevel :load-toplevel)
       (register-data-store-con-init :bdb 'bdb-test-and-construct))

This mapping between spec types and initialization functions is accessed by lookup-data-store-con-init from within build-controller. The function returned by lookup-data-store-con-init is passed the full specification and returns a store-controller subclass instance for the specified data store.

The new controller is stored in the *dbconnection-spec* hash table, associating the object with its specification. Finally Elephant calls open-controller to actually establish a connection to or create the files of the data store.

Finally, if the default store controller *store-controller* is nil, it will be initialized with the new store controller, otherwise the original value is left in *store-controller* until that store controller is closed using close-store.

The data store implementor has access to various utilities to aid initialization.

get-user-configuration-parameter - Access symbol tags in my-config.sexp to access data store specific user configuration. You can also add special variables to variables.lisp and add a tag-variable pair to *user-configurable-parameters* in variables.lisp to automatically initialize it when the store controller is opened.
get-con behavior when store is closed or lost
database-version a store controller implements this in order to tell Elephant what serializer to use. Currently, version 0.6.0 databases use serializer1 and all later database use serializer version 2. This is to ensure that a given version of the Elephant code can open databases from prior versions in order to properly upgrade to the new code base.
Symbol conversions. To aid in opening legacy databases, a symbol conversion facility is provided in controller.lisp to be applied to any symbols extracted from the legacy data store. (if, for instance, the type name of subclasses changed, such as sleepycat-btree becoming bdb-btree)

At this point, all operations referencing the store controller should be able to proceed.

At the end of a session,

7.2 Persistent Object Creation

The only thing that a data store has to do to support new object creation, other than implement the slot protocol, is implement the method next-oid to return the next unique object id for the persistent object being created.

Existing objects are created during deserialization of object references. The serializer subsystem is built-into the core of elephant and can be used by data stores. The serializer is abstracted so that multiple serializers can be co-resident and the data store can choose the appropriate one. The abstraction boundary between the serializer, the data store, and the core Elephant system is not perfect, so be aware and refer to existing data store implementations if in doubt.

A serializer takes as arguments the store-controller, lisp object and a buffer-stream from the memory utility library and returns the buffer-stream with the binary serialized object. The deserializer reverses this process. For all lisp objects except persistent classes, this means reallocating the storage space for the object and recreating all its contents. Deserializing a standard object results in a new standard object of the same class with the same slot values.

Persistent classes are dealt with specially. When a persistent object is serialized, it's oid and class are stored in the buffer-stream. On deserialization it uses the oid to check in the store-controller's cache for an existing placeholder object. If the cache misses, then it creates a new placeholder object using the class and oid as described in See Persistent Classes and Objects. The store controller contains a cache instance that is automatically initialized by the core Elephant object protocol.

Currently the serializer is selected by the core Elephant code based on the store controller's database version. See the reference section for details on implementing the store-controller database version method. It is a relatively small change to have the data store choose its own serializer, however we will have to tighten up and document the contracts between the Elephant core code, serializer and data store.

7.3 Persistent Slot Protocol

The core protocol that the data store needs to support is the slot access protocol. During object initialization, these functions are called to initialize the slots of the object. The four functions are:

persistent-slot-reader
persistent-slot-writer
persistent-slot-boundp
persistent-slot-makunbound

More details can be found in the data store api reference section. In short, these functions specialize on the specific store-controller of the data store and take instances, values and slotnames as appropriate.

Typically the oid will be extracted from the instance and be used to update a table or record where the oid and slotname identifies the value. A slot is typically unbound when no value exists (as opposed to nil).

7.4 Persistent Collection Protocols

The BTree protocol is the most extensive interface that data stores must implement. Data store implementations are required to subclass the abstract classes btree, indexed-btree, and index and implement their complete APIs. Each class type is constructed by Elephant using a store-controller that builds them. These methods are build-btree, build-indexed-btree and build-index.

The get-value interface is similar to the persistent slot reader and writer, but instead of using oid and slotname to set values, it uses the btree oid and a key value as a unique identifier for a value.

The BTree protocol almost requires an actual BTree implementation to be at all efficient. Keys and values need to be accessible via the cursor API, which means they need to be walked linearly in the sort order of the keys (described in Persistent BTrees).

An indexed BTree automatically maintains a hash table of the indices defined on it so that users can access them by mapping or lookup-by-name. The data store also has access to this interface.

A BTree index must also maintain a connection to its parent BTree so that an index value can be used as a primary tree key to retrieve the primary BTree value as part of the cursor-pnext and cursor-pprev family of methods.

The contract of remove-kv is that the storage in the data store is actually freed for reuse.

Persistent set implemenation is optional. A default BTree based implementation is provided by default

7.5 Implementing Transactions

One of the most important pieces of functionality remaining to discuss is implementing transactions. In existing data stores, transactions are merely extensions of the underlying start, commit and abort methods of the 3rd party library or server being used. The Elephant user interfaces to these functions in two ways: a call to execute-transaction or explicit calls to controller-start-transaction, controller-commit-transaction and controller-abort-transaction.

7.5.1 Implementing Execute Transaction

The macros with-transaction and ensure-transaction wrap access to the data store's execute-transaction. This function has a rich contract. It accepts as arguments the store controller, a closure that executes the transaction body and a set of keywords. Keywords required to be supported by the method (or ignored without loss of semantics) are :parent and :retries.

The semantics of with-transaction are that a new transaction will always be requested of the data store. If a transaction exists, ensure-transaction will merely call the transaction closure. If not it will function as a call to with-transaction.

execute-transaction is that it must ensure that the transaction closure is executed within a dynamic context that insures the ACID properties of any database operations (pset,btree or persistent slot operations). If there is a non-local exit during this execution, the transaction should be aborted. If it returns normally, the transaction is committed. The integer in the :retries argument dictates how many times execute-transaction should retry the transaction before failing.

Elephant provides some bookkeeping to the data store to help with nested transactions by using the *current-transaction* dynamic variable. In the dynamic context of the transaction closure, another call to execute-transaction may occur with the transaction argument defaulting to the value of *current-transaction*. The data store has to decide how to handle these cases. To support this, the first call to execute transaction can create a dynamic binding for *current-transaction* using the make-transaction-record call. This creates a transaction object that records the store controller that started the transaction and any data store-specific transaction data.

The current policy is that the body of a transaction is executed with the *store-controller* variable bound to the store-controller object creating the transaction. This is important for default arguments and generally helps more than it hurts, so is an implementation requirement placed on execute-transaction.

If two nested calls to with-transaction are made successively in a dynamic context, the data store can create true nested transactions. The first transaction is passed to the :parent argument of the second. The second can choose to just continue the current transaction (the CLSQL data store policy) or to nest the transaction (the BDB data store policy).

7.5.2 Interleaving Multiple Store Transactions

Finally, some provision is made for the case where two store controllers have concurrently active transactions in the same thread. This feature was created to allow for migration, where a read from one database happens in one transaction, and while active has to writes to another data store with a valid transaction.

The trick is that with-transaction checks to see if the current transaction object is the same as the store-controller object passed to the :store-controller argument. If not, a fresh transaction is started.

Currently no provision is made for more than two levels of multi-store nesting as we do not implement a full transaction stack (to avoid walking the stack on each call to handle this rare case). If a third transaction is started by the store controller that started the first transaction, it will have no access to the parent transaction which may be a significant source of problems for the underlying database.