One of the most important features of a database is that operations enforce the ACID properties: Atomic, Consistent, Isolated, and Durable. In plainspeak, this means that a set of changes is made all at once, that the database is never partially updated, that each set of changes happens sequentially and that a change, once made, is not lost.
Elephant provides this protection for all primitive operations. For example, when you write a value to an indexed slot, the update to the persistent slot record as well as the slot index is protected by a transaction that performs all the updates atomically and thus enforcing consistency.
Most real applications will need to use explicit transactions rather than relying on the primitives alone because you will want multiple read-modify-update operations act as an atomic unit. A good example for this is a banking system. If a thread is going to modify a balance, we don't want another thread modifying it in the middle of the operation or one of the modifications may be lost.
(defvar *accounts* (make-btree)) (defun add-account (account) (setf (get-value account *account*) (defun balance (account) (get-value account *accounts*)) (defun (setf balance) (amount account) (setf (get-value account *accounts*) amount)) (defun deposit (account amount) "This shows a read and a write function call to get then set the balance" (let ((balance (balance account))) (setf (balance account) (+ balance amount)))) (defun withdraw (account amount) "A nice concise lisp version for withdraw" (decf (balance account) amount)) (add-account 'me) => 0 (deposit 'me 100) => 100 (balance 'me) => 100 (withdraw 'me 25) => 75 (balance 'me) => 75
This simple bank example has a significant vulnerability. If two threads read the same balance and one writes a new balance followed by the other, the second balance was written without access to the balance provided by the first and so the first transaction is lost.
The way to avoid this is to group a set of operations together, such
as the read and write in
accomplish this by establishing a dynamic context called a
During a transaction, all changes are cached until the transaction is committed. The changes made by a committed transaction happens all at once. Transactions can also be aborted due to errors that happen while they are active or because of contention. Contention is when another thread writes to a variable that the current transaction is reading. As in the bank example above, if one transaction writes the balance after the current one has read it, then the current one should start over so it has an accurate balance to work with. A transaction aborted due to contention is usually restarted until it has failed too many times.
The simplest and best way to use transactions in Elephant is to simply
wrap all the operations in the
with-transaction macro. Any
statements in the body of the macro are executed within the same
transaction. Thus we would modify our example above as follows:
(defun deposit (account amount) (with-transaction () (let ((balance (balance account))) (setf (balance account) (+ balance amount))))) (defun withdraw (account amount) (with-transaction () (decf (balance account) amount)))
And presto, we have an ACID compliant, thread-safe, persistent banking system!
with-transaction really doing for us? It first starts
a new transaction, attempts to execute the body, and commits the
transaction if successful. If anytime during the dynamic extent of
this process there is a conflict with another thread's transaction, an
error, or other non-local transfer of control, the transaction is
aborted. If it was aborted due to contention or deadlock, it attempts
to retry the transaction a fixed number of times by re-executing the
And this brings us to two important constraints on transaction bodies: no dynamic nesting and idempotent side-effects.
In general, you want to avoid nested uses of
statements over multiple functions. Nested transactions are valid for
some data stores (namely Berkeley DB), but typically only a single
transaction can be active at a time. The purpose of a nested
transaction in data stores that support them is to break a long
transaction into subsets. This way if there is contention on a given
subset of variables, only the inner transaction is restarted while the
larger transaction can continue. When the inner transaction commits
its results, those results become part of the outer transaction but
are not written to disk until the outer transaction commits.
If you have transaction protected primitive operations (such as
withdraw) and you want to perform a group of
such transactions, for example a transfer between accounts, you can
use the macro
ensure-transaction instead of
(defun deposit (account amount) "Wrap the balance read and the setf with the new balance" (ensure-transaction () (let ((balance (balance account))) (setf (balance account) (+ balance amount))))) (defun deposit (account amount) "A more concise version with decf doing both read and write" (ensure-transaction () (decf (balance account) amount))) (defun withdraw (account amount) (ensure-transaction () (decf (balance account) amount))) (defun transfer (src dst amount) "There are four primitive read/write operations grouped together in this transaction" (with-transaction () (withdraw src amount) (deposit dst amount)))
ensure-transaction is exactly like
except it will reuse an existing transaction, if there is one, or
create a new one. There is no harm, in fact, in using this macro all
Notice the use of
incf above. The primary
reason to use Lisp is that it is good at hiding complexity using
shorthand constructs just like this. This also means it is also going
to be good at hiding data dependencies that should be captured in a
Within the body of a with-transaction, any non database operations need to be idempotent. That is the side effects of the body must be the same no matter how many times the body is executed. This is done automatically for side effects on the database, but not for side effects like pushing a value on a lisp list, or creating a new standard object.
(defparameter *transient-objects* nil) (defun load-transients (n) "This is the wrong way!" (with-transaction () (loop for i from 0 upto n do (push (get-from-root i) *transient-objects*))))
In this contrived example we are pulling a set of standard objects from the database using an integer key and pushing them onto a list for later use. However, if there is a conflict where some other process writes a key-value pair to a matching key, the whole transaction will abort and the loop will be run again. In a heavily contended system you might see results like the following.
(defun test-list () (setf *transient-objects* nil) (load-transients) (length *transient-objects*)) (test-list 3) => 3 (test-list 3) => 5 (test-list 3) => 4
So the solution is to make sure that the operation on the lisp parameters is atomic if the transaction completes.
(defun load-transients (n) "This is a better way" (setq *transient-objects* (with-transaction () (loop for i from 0 upto n collect (get-from-root i)))))
(Of course we would need to use
nreverse if we cared about the
order of instances in
The best rule-of-thumb is to ensure that transaction bodies are purely functional as above, except for side effects to persistent objects and btrees.
If you really do need to execute side-effects into lisp memory, such as writes to transient slots, make sure they are idempotent and that other processes cannot read the written values until the transaction completes.
By now transactions almost look like more work than they are worth! Fortunately, there are also performance benefits to explicit use of transactions. Transactions gather together all the writes that are supposed to made to the database and store them in memory until the transaction commits, and only then writes them to the disk.
The most time-intensive component of a transaction is waiting while flushing newly written data to disk. Using the default auto-committing behavior requires a disk flush for every primitive write operation. This is very, very expensive! Because all the values read or written are cached in memory until the transaction completes, the number of flushes can be dramatically reduced.
But don't take my word for it, run the following statements and see for yourself the visceral impact transactions can have on system performance.
(defpclass test () ((slot1 :accessor slot1 :initarg :slot1))) (time (loop for i from 0 upto 100 do (make-instance 'test :slot1 i)))
This can take a long time, well over a minute on the CLSQL data store. Here each new objects that is created has to independantly write its value to disk and accept a disk flush cost.
(time (with-transaction () (loop for i from 0 upto 100 do (make-instance 'test :slot1 i))))
Wrapping this operation in a transaction dramatically increases the time from 10's of seconds to a second or less.
(time (with-transaction () (loop for i from 0 upto 1000 do (make-instance 'test :slot1 i))))
When we increase the number of objects within the transaction, the time cost does not go up linearly. This is because the total time to write a hundred simple objects is still dominated by the disk writes.
These are huge differences in performance! However we cannot have infinitely sized transactions due to the finite size of the data store's memory cache. Large operations (such as loading data into a database) need to be split into a sequential set of smaller transactions. When dealing with persistent objects a good rule of thumb is to keep the number of objects touched in a transaction well under 1000.
Designing and tuning a transactional architecture can become quite complex. Moreover, bugs in your system can be very difficult to find as they only show up when transactions are interleaved within a larger, multi-threaded application.
In many cases you can simply ignore transactions. For example, when you don't have any other concurrent processes running. In this case all operations are sequential and there is no chance of conflicts. You would only want to use transactions to improve performance on repeated sets of operations.
You can also ignore transactions if your application can guarantee that concurrency won't generate any conflicts. For example, a web app that guarantees only one thread will write to objects in a particular session can avoid transactions altogether. However, it is good to be careful about making these assumptions. In the above example, a reporting function that iterates over sessions, users or other objects may still see partial updates (i.e. a user's id was written prior to the query, but not the name). However, if you don't care about these infrequent glitches, this case would still hold.
If these cases don't apply to your application, or you aren't sure,
you will fare best by programming defensively. Break your system into
the smallest logical sets of primitive operations
ensure-transaction and then wrap the highest level calls made
to your system in with-transaction when the operations absolutely have
to commit together or you need the extra performance. Try not to have
more than two levels of transactional accesses with the top using
with-transaction and the bottom using ensure-transaction.
See Transaction Details for more details and Design Patterns for examples of how systems can be designed and tuned using transactions.