Closure XML Parser

An XML parser written in Common Lisp.

Closure XML was written by Gilbert Baumann (unk6 at rz.uni-karlsruhe.de) as part of the Closure web browser.
Contributions to the parser by

Mailing list cxml-devel is hosted on common-lisp.net.

Download.

There is no CVS repository (yet).
You can check out David's tla archive at http://www.common-lisp.net/project/cxml/david@knowledgetools.de--cxml/.
There will also be tarballs.

Contents

CXML Modules

CXML provides three packages:

Installation

CXML is written in Common Lisp and should be portable to all Common Lisp implementations.  Currently known to work are ACL, SBCL, CMUCL, and CLISP. (CLISP needs some -E option teaching it to accept non-ASCII source files.)

ASDF is used for compilation. The following instructions assume that ASDF has already been loaded.

Configuration (optional). CXML has full Unicode code support -- even on Lisps without Unicode strings. On non-unicode aware Lisps, DOMString is implemented as an array of character codes. If your Lisp supports 16 bit characters natively, you can enable feature RUNE-IS-CHARACTER to select an alternative DOMString implementatation, which uses real characters instead of characters codes.

* (pushnew :rune-is-character *features*)

Compiling and loading CXML. Register the .asd file, e.g. by symlinking it:

$ ln -sf `pwd`/cxms.asd /path/to/your/registry

Then compile CXML using:

* (asdf:operate 'asdf:load-op :cxml)

Tests

Check out the XML and DOM testsuites:

$ export CVSROOT=:pserver:anonymous@dev.w3.org:/sources/public
$ cvs login    # password is "anonymous"
$ cvs co 2001/XML-Test-Suite/xmlconf
$ cvs co 2001/DOM-Test-Suite

Usage and expected output:

* (xmlconf:run-all-tests "/path/to/2001/XML-Test-Suite/xmlconf/")
22/389 tests failed; 1773 tests were skipped
* (domtest:run-all-tests "/path/to/2001/2001/DOM-Test-Suite/")
0/440 tests failed; 81 tests were skipped

Most XML testsuite failures are due to document type declarations which are read by CXML, but not written when the document is serialized again.  This needs work.

fixme: Add an explanation of xml/sax-tests here.

fixme My parser does not understand the current testsuite anymore.  To fix this problem, revert the affected files manually after check-out:

$ cd 2001/XML-Test-Suite/xmlconf/
xmltest$ patch -p0 -R </path/to/cxml/test/xmlconf-base.diff

The log message for the changes reads "Removed unnecessary xml:base attribute".  If I understand correctly, only DOM 3 parsers provide the baseURI attribute necessary for understanding xmlconf.xml now.  We don't have that yet.

To do

Using the parser

Function XML:PARSE-FILE (pathname handler)
Function XML:PARSE-STREAM (stream handler)
Function XML:PARSE-OCTETS (octets handler)
Parse an XML document.  Arguments:

Return values from this function depend on the SAX handler used.

Function DOM:MAKE-DOM-BUILDER ()
Create a SAX handler which builds a DOM document.  Example:

(xml:parse-file "test.xml" (dom:make-dom-builder))

Function XML:UNPARSE-DOCUMENT (document stream)
Function XML:UNPARSE-DOCUMENT-TO-OCTETS (document) => vector
Serialize a document into
canonical form.

unparse-document-to-octets returns an (unsigned-byte 8) array, whereas unparse-document writes characters.  unparse-document is useful together with with-output-to-string.  However, note that the resulting document in both cases is UTF-8 encoded, so the characters written by unparse-document are really UTF-8 bytes encoded as characters.

SAX interface

A SAX handler is an arbitrary objects that implements some of the generic functions in the SAX package.  Note that no default handler class is necessary, because all generic functions have default methods which do nothing.  SAX functions are:

Function SAX:START-DOCUMENT (handler)
Function SAX:START-ELEMENT (handler namespace-uri local-name qname attributes)
Function SAX:START-PREFIX-MAPPING (handler prefix uri)
Function SAX:CHARACTERS (handler data)
Function SAX:PROCESSING-INSTRUCTION (handler target data)
Function SAX:END-PREFIX-MAPPING (handler prefix)
Function SAX:END-ELEMENT (handler namespace-uri local-name qname attributes)
Function SAX:END-DOCUMENT (handler)
Function SAX:COMMENT (handler data)
Function SAX:START-CDATA (handler)
Function SAX:END-CDATA (handler)
Function SAX:START-DTD (handler name public-id system-id)
Function SAX:END-DTD (handler)

fixme: For information on these functions refer to the docstrings.

fixme: Entity and notation processing isn't quite right yet.

DOM Notes

CXML implements the DOM Level 1 Core interfaces.  Explaining DOM is better left to the specification, so please refer to the official W3C documents for DOM.

However, there is no "standard" DOM mapping for Lisp.  DOM is specified in CORBA IDL, but it refrains from using object-oriented IDL features, allowing for a much more natural Lisp implemenation than the the ordinary IDL/Lisp mapping would.

Differences between CXML's DOM and the direct IDL/Lisp mapping:

Example:

XML(97): (dom:node-type
          (dom:document-element
           (xml:parse-file "~/test.xml" (dom:make-dom-builder))))
:ELEMENT