Using the SAX parser

Parsing and Validating

CXML is implemented as a SAX parser. (Refer to make-dom-builder for information about DOM.)

Function CXML:PARSE-FILE (pathname handler &key ...)
Function CXML:PARSE-STREAM (stream handler &key ...)
Function CXML:PARSE-OCTETS (octets handler &key ...)
Function CXML:PARSE-ROD (rod handler &key ...)
Parse an XML document.  Return values from this function depend on the SAX handler used.
Arguments:

Common keyword arguments:

Note: parse-rod assumes that the input has already been decoded into Unicode runes and ignores the encoding specified in the XML declaration, if any.

Function CXML:PARSE-DTD-FILE (pathname)
Function CXML:PARSE-DTD-STREAM (stream)
Parse declarations from a stand-alone file and return an object representing the DTD, suitable as an argument to validate.

Function CXML:MAKE-EXTID (publicid systemid)
Create an object representing the External ID composed of the specified Public ID, a rod or nil, and System ID (an URI object).

Condition class CXML:XML-PARSE-ERROR ()
Superclass of all conditions signalled by the CXML parser.

Condition class CXML:WELL-FORMEDNESS-VIOLATION (cxml:xml-parse-error)
This condition is signalled for all well-formedness violations. (Note that, when parsing document that is not well-formed in validating mode, the parser might encounter validity errors before detecting well-formedness problems, so also be prepared for validity-error in that situation.)

Condition class CXML:VALIDITY-ERROR (cxml:xml-parse-error)
Reports the violation of a validity constraint.

Serialization

Serialization is performed using sink objects. A sink is an output stream for runes. There are different kinds of sinks for output to lisp streams, vectors, etc.

Technically, sinks are SAX handlers that write XML output for SAX events sent to them. In practise, user code would normally not generate those SAX events manually, and instead use a function like dom:map-document or xmls-compat:map-node to serialize an in-memory document.

In addition to map-document, cxml has a set of convenience macros for serialization (see below for with-xml-output, with-element, etc).

Portable sinks:
Function CXML:MAKE-OCTET-VECTOR-SINK (&rest keys) => sink
Function CXML:MAKE-OCTET-STREAM-SINK (stream &rest keys) => sink
Function CXML:MAKE-ROD-SINK (&rest keys) => sink

Only on Lisps with Unicode support:
Function CXML:MAKE-STRING-SINK -- alias for cxml:make-rod-sink
Function CXML:MAKE-CHARACTER-STREAM-SINK (stream &rest keys) => sink

Only on Lisps without Unicode support:
Function CXML:MAKE-STRING-SINK/UTF8 (&rest keys) => sink
Function CXML:MAKE-CHARACTER-STREAM-SINK/UTF8 (stream &rest keys) => sink

Return a SAX serialization handle.

Keyword arguments:

The following canonical values are allowed:

An internal subset will be included in the result regardless of the canonical setting. It is the responsibility of the caller to not report an internal subset for canonical <= 1, or only notations as required for canonical = 2. For example, the include-doctype argument to dom:map-document should be set to nil for the former behaviour and :canonical-notations for the latter.

With an indentation level, pretty-print the XML by inserting additional whitespace.  Note that indentation changes the document model and should only be used if whitespace does not matter to the application.

Macro CXML:WITH-XML-OUTPUT (sink &body body) => sink-specific result
Macro CXML:WITH-ELEMENT (qname &body body) => result
Function CXML:ATTRIBUTE (name value) => value
Function CXML:TEXT (data) => data
Function CXML:CDATA (data) => data
Convenience syntax for event-based serialization.

Example:

(with-xml-output (make-octet-stream-sink stream :indentation 2 :canonical nil)
  (with-element "foo"
    (attribute "xyz" "abc")
    (with-element "bar"
      (attribute "blub" "bla"))
    (text "Hi there.")))

Prints this to stream:

<foo xyz="abc">
  <bar blub="bla"></bar>
  Hi there.
</foo>

Macro XHTML-GENERATOR:WITH-XHTML (sink &rest forms)
Macro XHTML-GENERATOR:WRITE-DOCTYPE (sink)
Macro with-xhtml is a modified version of Franz' htmlgen works as a SAX driver for XHTML. It aims to be a plug-in replacement for the html macro.

xhtmlgen is included as contrib/xhtmlgen.lisp in the cxml distribution. Example:

(let ((sink (cxml:make-character-stream-sink *standard-output*)))
  (sax:start-document sink)
  (xhtml-generator:write-doctype sink)
  (xhtml-generator:with-html sink
    (:html
     (:head
      (:title "Titel"))
     (:body
      ((:p "style" "font-weight: bold")
       "Inhalt")
      (:ul
       (:li "Eins")
       (:li "Zwei")
       (:li "Drei")))))
  (sax:end-document sink))

Miscellaneous SAX handlers

Function CXML:MAKE-VALIDATOR (dtd root)
Create a SAX handler which validates against a DTD instance.  The document's root element must be named root.  Used with dom:map-document, this validates a document object as if by re-reading it with a validating parser, except that declarations recorded in the document instance are completely ignored.
Example:

(let ((d (parse-file "~/test.xml" (cxml-dom:make-dom-builder)))
      (x (parse-dtd-file "~/test.dtd")))
  (dom:map-document (cxml:make-validator x #"foo") d))

Class CXML:SAX-PROXY ()
Accessor CXML:PROXY-CHAINED-HANDLER
sax-proxy is a SAX handler which passes all events it receives on to a user-defined second handler, which defaults to nil. Use sax-proxy to modify the events a SAX handler receives by defining your own subclass of sax-proxy. Setting the chained handler to the target handler, and define methods on your handler class for the events to be modified. All other events will pass through to the chained handler unmodified.

Accessor CXML:MAKE-NAMESPACE-NORMALIZER (next-handler)

Return a SAX handler that performs DOM 3-style namespace normalization on attribute lists in start-element events before passing them on the next handler.

Function CXML:MAKE-WHITESPACE-NORMALIZER (chained-handler &optional dtd)
Return a SAX handler which removes whitespace from elements that have element content and have not been declared to preserve space using an xml:space attribute.

Example:

(cxml:parse-file "example.xml"
                 (cxml:make-whitespace-normalizer (cxml-dom:make-dom-builder))
                 :validate t)

Example input:

<!DOCTYPE test [
<!ELEMENT test (foo,bar*)>
<!ATTLIST test a CDATA #IMPLIED>
<!ELEMENT foo #PCDATA>
<!ELEMENT bar (foo?)>
<!ATTLIST bar xml:space (default|preserve) "default">
]>
<test a='b'>
  <foo>   </foo>
  <bar>   </bar>
  <bar xml:space="preserve">   </bar>
</test>

Example result:

<test a="b"><foo>   </foo><bar></bar><bar xml:space="preserve">   </bar></test>

Recoders

Recoders are a mechanism used by CXML internally on Lisp implementations without Unicode support to recode UTF-16 vectors (rods) of integers (runes) into UTF-8 strings.

User code does not usually need to deal with recoders in current versions of CXML.

Function CXML:MAKE-RECODER (chained-handler recoder-fn)
Return a SAX handler which passes all events on to chained-handler after converting all strings and rods using recoder-fn, a function of one argument.

Caching of DTD Objects

To avoid spending time parsing the same DTD over and over again, CXML can cache DTD objects. The parser consults cxml:*dtd-cache* whenever it is looking for an external subset in a document which does not have an internal subset and uses the cached DTD instance if one is present in the cache for the System ID in question.

Note that DTDs do not expire from the cache automatically. (Future versions of CXML might introduce automatic checks for outdated DTDs.)

Variable CXML:*DTD-CACHE*
The DTD cache object consulted by the parser when it needs a DTD.

Function CXML:MAKE-DTD-CACHE ()
Return a new, empty DTD cache object.

Variable CXML:*CACHE-ALL-DTDS*
If true, instructs the parser to enter all DTDs that could have been cached into *dtd-cache* if they were not cached already. Defaults to nil.

Reader CXML:GETDTD (uri dtd-cache)
Return a cached instance of the DTD at uri, if present in the cache, or nil.

Writer CXML:GETDTD (uri dtd-cache)
Enter a new value for uri into dtd-cache.

Function CXML:REMDTD (uri dtd-cache)
Ensure that no DTD is recorded for uri in the cache and return true if such a DTD was present.

Function CXML:CLEAR-DTD-CACHE (dtd-cache)
Remove all entries from dtd-cache.

fixme: thread-safety

XML Catalogs

External entities (for example, DTDs) are referred to using their Public and System IDs. Usually the System ID, a URI, is used to locate the entity. CXML itself handles only file://-URIs, but many System IDs in practical use are http://-URIs. There are two different mechanims applications can use to allow CXML to locate entities using arbitrary Public ID or System ID:

This section describes XML Catalogs, the second solution. CXML implements Oasis XML Catalogs.

Variable CXML:*CATALOG*
The XML Catalog object consulted by the parser before trying to open an entity. Initially nil.

Variable CXML:*PREFER*
The default "prefer" mode from the Catalog specification, one of :public or :system. Defaults to :public.

Function CXML:MAKE-CATALOG (&optional uris)
Return a catalog object for the catalog files specified.

Function CXML:RESOLVE-URI (uri catalog)
Look up uri in catalog and return the resulting URI, or nil if no match was found.

Function CXML:RESOLVE-EXTID (publicid systemid catalog)
Look up the External ID (publicid, systemid) in catalog and return the resulting URI, or nil if no match was found.

Example:

* (setf cxml:*catalog* nil)
* (cxml:parse-file "test.xhtml" nil)
=> Error: URI scheme :HTTP not supported

* (setf cxml:*catalog* (cxml:make-catalog))
* (cxml:parse-file "test.xhtml" nil)
;; no error!
NIL

Note that parsed catalog files are cached in the catalog object. Catalog files cached do not expire automatically. To ensure that all catalog files are parsed again, create a new catalog object.

SAX Interface

A SAX handler is an arbitrary objects that implements some of the generic functions in the SAX package.  Note that no default handler class is necessary, because all generic functions have default methods which do nothing.  SAX functions are:

Function SAX:START-DOCUMENT (handler)
Function SAX:END-DOCUMENT (handler)

Function SAX:START-ELEMENT (handler namespace-uri local-name qname attributes)
Function SAX:END-ELEMENT (handler namespace-uri local-name qname)
Function SAX:START-PREFIX-MAPPING (handler prefix uri)
Function SAX:END-PREFIX-MAPPING (handler prefix)
Function SAX:PROCESSING-INSTRUCTION (handler target data)
Function SAX:COMMENT (handler data)
Function SAX:START-CDATA (handler)
Function SAX:END-CDATA (handler)
Function SAX:CHARACTERS (handler data)

Function SAX:START-DTD (handler name public-id system-id)
Function SAX:END-DTD (handler)
Function SAX:START-INTERNAL-SUBSET (handler)
Function SAX:END-INTERNAL-SUBSET (handler)
Function SAX:UNPARSED-ENTITY-DECLARATION (handler name public-id system-id notation-name)
Function SAX:EXTERNAL-ENTITY-DECLARATION (handler kind name public-id system-id)
Function SAX:INTERNAL-ENTITY-DECLARATION (handler kind name value)
Function SAX:NOTATION-DECLARATION (handler name public-id system-id)
Function SAX:ELEMENT-DECLARATION (handler name model)
Function SAX:ATTRIBUTE-DECLARATION (handler ename aname type default)

Accessor SAX:ATTRIBUTE-PREFIX (attribute)
Accessor SAX:ATTRIBUTE-NAMESPACE-URI (attribute)
Accessor SAX:ATTRIBUTE-LOCAL-NAME (attribute)
Accessor SAX:ATTRIBUTE-QNAME (attribute)
Accessor SAX:ATTRIBUTE-SPECIFIED-P (attribute)
Accessor SAX:ATTRIBUTE-VALUE (attribute)

Function SAX:FIND-ATTRIBUTE (qname attributes)
Function SAX:FIND-ATTRIBUTE-NS (uri lname attributes)

The entity declaration methods are similar to Java SAX definitions, but parameter entities are distinguished from general entities not by a % prefix to the name, but by the kind argument, either :parameter or :general.

The arguments to sax:element-declaration and sax:attribute-declaration differ significantly from their Java counterparts.

fixme: For more information on these functions refer to the docstrings.