I believe the code I attached was scrubbed. Here is a link instead: <a href="http://iodb.org/static/persistent-heap/">http://iodb.org/static/persistent-heap/</a><br><br>Red<br><br><div class="gmail_quote">On Tue, Oct 28, 2008 at 10:30 AM, Red Daly <span dir="ltr"><<a href="mailto:reddaly@gmail.com">reddaly@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I am hoping to clarify some of the prior discussions[1] about the native Lisp backend for Elephant and propose a basic architecture.  Hopefully we can modularize development so if somebody wants to hack for a few days on the backend he can avoid being overwhelmed.<br>


<br>Multiprocess support: What features do the major lisps have for locking and concurrency?  What features are we missing from C/linux--what does BDB do in this regard that is hard to do in Lisp?  I do not know too much about implementing efficient multi-threaded lisp programs where there is a lot of contention.<br>


<br>Distribution: Are others interested in making the backend distributed?  I prefer the design to allow adding an efficient distributed architecture at a later date.<br><br>Custom indexing: A lot of discussion has gone into the best

implementation practice for BTrees.  But what about other indexing

structures?  Multidimensional indexing is relevant to my project now

because we have spacial information (longitude latitude pairs) we would

like to query.  Right now I am using a kd-tree with nodes made

persistent by elephant, but usually this sort of index would be

implemented by the database itself.  I imagine multi-dimensional indexing could improve queries, too.<br>

<br>

There are other indexing needs as well.  Document-level search is

another common case.  I imagine an efficient search library

is beyond our capacity, but how could we make the database suited to implement search on top of the thing?  BTrees are not everything, unless I am missing something.<br><br><br>I propose the following basic architecture for the backend:<br>


<br>Logging package with generic undo/redo logging support.  It would be customizable but it would not depend on other code from the Lisp backend.  A database could plug into the log by implementing undo, redo, and serialization functions.  By modularizing logging, it should make it easier to hack on it without being familiar with the rest of the backend.  It will also be reusable, for what it's worth.<br>


<br>

A persistent heap.  Beneath indices and data storage would be a heap-on-disk layer with transaction support.  The heap would have methods for reading, writing, and locking sequences of bytes.  It would also handle   The persistent heap alone would qualify as a database and might be useful as a basis for other DB projects or experiments.  Cachine would probably not be implemented at this level, though that would be easiest.  It may be possible to implement the transactional heavy lifting for the persistent heap and make it extensible enough to be used throughout.<br>


<br>B-Trees.  A MOP-based implementation of B-Trees may be customizable enough to plug into the the rest of the system with a few additional method declarations for the persistent, transactional version.  Specialized methods would access the persistent heap for reads and writes.  Locking of the sort done by BDB (level 2 etc.) could probably be accomplished in these methods specialized for our persistent B-Tree.<br>


<br>Bkd-Trees and other indexing structures could be implemented similarly to B-Trees.<br><br>Caching.  B-Tree nodes or other lispy objects should be cached as opposed to byte arrays.  Caching of B-Tree nodes requires additional multiprocess code, so not all the transactional magic happens in the persistent heap layer.  I'm a little fuzzy on exactly how this will work, but it sounds reasonable enough.<br>


<br>The DB user would interact mostly with the B-Tree  and other indexing structures, and with transactions.<br><br><br>How does this sound for a basic architecture?  The multiprocess details are not fleshed out so your comments are appreciated.  I have attached my basic implementation of a logger and persistent heap to make this clearer.<br>


<br><br>Best regards,<br><br>Red<br><br><br>[1]  The most lengthy discussion I can find is here:  <a href="http://common-lisp.net/pipermail/elephant-devel/2008-May/004108.html" target="_blank">http://common-lisp.net/pipermail/elephant-devel/2008-May/004108.html</a><br>


The trac has a page on the Lisp backend: <a href="http://trac.common-lisp.net/elephant/wiki/LispDataStore" target="_blank">http://trac.common-lisp.net/elephant/wiki/LispDataStore</a><br>

</blockquote></div><br>