Klacks parser

The Klacks parser provides an alternative parsing interface, similar in concept to Java's Streaming API for XML (StAX).

It implements a streaming, "pull-based" API. This is different from SAX, which is a "push-based" model.

Klacks is implemented using the same code base as the SAX parser and has the same parsing characteristics (validation, namespace support, entity resolution) while offering a more flexible interface than SAX.

See below for examples.

Parsing incrementally using sources

To parse using Klacks, create an XML source first.

Function CXML:MAKE-SOURCE (input &key validate dtd root entity-resolver disallow-external-subset pathname)
Create and return a source for input.

Exact behaviour depends on input, which can be one of the following types:

Closing streams: Sources can refer to Lisp streams that need to be closed after parsing. This includes a stream passed explicitly as input, a stream created implicitly for the pathname case, as well as any streams created automatically for external parsed entities referred to by the document.

All these stream get closed automatically if end of file is reached normally. Use klacks:close-source or klacks:with-open-source to ensure that the streams get closed otherwise.

Buffering: By default, the Klacks parser performs buffering of octets being read from the stream as an optimization. This can result in unwanted blocking if the stream is a socket and the parser tries to read more data than required to parse the current event. Use :buffering nil to disable this optimization.

The following keyword arguments have the same meaning as with the SAX parser, please refer to the documentation of parse-file for more information:

In addition, the following argument is for types of input other than pathname:

Events are read from the stream using the following functions:

Function KLACKS:PEEK (source)

=> :start-document
or => :start-document, version, encoding, standalonep
or => :dtd, name, public-id, system-id
or => :start-element, uri, lname, qname
or => :end-element, uri, lname, qname
or => :characters, data
or => :processing-instruction, target, data
or => :comment, data
or => :end-document, data
or => nil

peek returns the current event's key and main values.

Function KLACKS:PEEK-NEXT (source) => key, value*

Advance the source forward to the next event and returns it like peek would.

Function KLACKS:PEEK-VALUE (source) => value*

Like peek, but return only the values, not the key.

Function KLACKS:CONSUME (source) => key, value*

Return the same values peek would, and in addition advance the source forward to the next event.

Function KLACKS:CURRENT-URI (source) => uri
Function KLACKS:CURRENT-LNAME (source) => string
Function KLACKS:CURRENT-QNAME (source) => string

If the current event is :start-element or :end-element, return the corresponding value. Else, signal an error.

Function KLACKS:CURRENT-CHARACTERS (source) => string

If the current event is :characters, return the character data value. Else, signal an error.

Function KLACKS:CURRENT-CDATA-SECTION-P (source) => boolean

If the current event is :characters, determine whether the data was specified using a CDATA section in the source document. Else, signal an error.

Function KLACKS:MAP-CURRENT-NAMESPACE-DECLARATIONS (fn source) => nil

For use only on :start-element and :end-element events, this function report every namespace declaration on the current element. On :start-element, these correspond to the xmlns attributes of the start tag. On :end-element, the declarations of the corresponding start tag are reported. No inherited namespaces are included. fn is called only for each declaration with two arguments, the prefix and uri.

Function KLACKS:MAP-ATTRIBUTES (fn source)

Call fn for each attribute of the current start tag in turn, and pass the following values as arguments to the function:

Only valid for :start-element.

Return a list of SAX attribute structures for the current start tag. Only valid for :start-element.

Function KLACKS:CLOSE-SOURCE (source)
Close all streams referred to by source.

Macro KLACKS:WITH-OPEN-SOURCE ((var source) &body body)
Evaluate source to create a source object, bind it to symbol var and evaluate body as an implicit progn. Call klacks:close-source to close the source after exiting body, whether normally or abnormally.

Convenience functions

Function KLACKS:FIND-EVENT (source key)
Read events from source and discard them until an event of type key is found. Return values like peek, or NIL if no such event was found.

Function KLACKS:FIND-ELEMENT (source &optional lname uri)
Read events from source and discard them until an event of type :start-element is found with matching local name and namespace uri is found. If lname is nil, any tag name matches. If uri is nil, any namespace matches. Return values like peek or NIL if no such event was found.

Condition KLACKS:KLACKS-ERROR (xml-parse-error)
The condition class signalled by expect.

Function KLACKS:EXPECT (source key &optional value1 value2 value3)
Assert that the current event is equal to (key value1 value2 value3). (Ignore value arguments that are NIL.) If so, return it as multiple values. Otherwise signal a klacks-error.

Function KLACKS:SKIP (source key &optional value1 value2 value3)
expect the specific event, then consume it.

Macro KLACKS:EXPECTING-ELEMENT ((fn source &optional lname uri) &body body
Assert that the current event matches (:start-element uri lname). (Ignore value arguments that are NIL) Otherwise signal a klacks-error. Evaluate body as an implicit progn. Finally assert that the remaining event matches (:end-element uri lname).

Bridging Klacks and SAX

Function KLACKS:SERIALIZE-EVENT (source handler)
Send the current klacks event from source as a SAX event to the SAX handler and consume it.

Function KLACKS:SERIALIZE-ELEMENT (source handler &key document-events)
Read all klacks events from the following :start-element to its :end-element and send them as SAX events to handler. When this function is called, the current event must be :start-element, else an error is signalled. With document-events (the default), sax:start-document and sax:end-document events are sent around the element.

Function KLACKS:SERIALIZE-SOURCE (source handler)
Read all klacks events from source and send them as SAX events to the SAX handler.

Class KLACKS:TAPPING-SOURCE (source)
A klacks source that relays events from an upstream klacks source unchanged, while also emitting them as SAX events to a user-specified handler at the same time.

Functon KLACKS:MAKE-TAPPING-SOURCE (upstream-source &optional sax-handler)
Create a tapping source relaying events for upstream-source, and sending SAX events to sax-handler.

Location information

Function KLACKS:CURRENT-LINE-NUMBER (source)
Return an approximation of the current line number, or NIL.

Function KLACKS:CURRENT-COLUMN-NUMBER (source)
Return an approximation of the current column number, or NIL.

Function KLACKS:CURRENT-SYSTEM-ID (source)
Return the URI of the document being parsed. This is either the main document, or the entity's system ID while contents of a parsed general external entity are being processed.

Function KLACKS:CURRENT-XML-BASE (source)
Return the [Base URI] of the current element. This URI can differ from the value returned by current-system-id if xml:base attributes are present.

Examples

The following example illustrates creation of a klacks source, use of the peek-next function to read individual events, and shows some of the most common event types.

* (defparameter *source* (cxml:make-source "<example>text</example>"))
*SOURCE*

* (klacks:peek-next *source*)
:START-DOCUMENT

* (klacks:peek-next *source*)
:START-ELEMENT
NIL                      ;namespace URI
"example"                ;local name
"example"                ;qualified name

* (klacks:peek-next *source*)
:CHARACTERS
"text"

* (klacks:peek-next *source*)
:END-ELEMENT
NIL
"example"
"example"

* (klacks:peek-next *source*)
:END-DOCUMENT

* (klacks:peek-next *source*)
NIL

In this example, find-element is used to skip over the uninteresting events until the opening child1 tag is found. Then serialize-element is used to generate SAX events for the following element, including its children, and an xmls-compatible list structure is built from those events. find-element skips over whitespace, and find-event is used to parse up to :end-document, ensuring that the source has been closed.

* (defparameter *source*
      (cxml:make-source "<example>
                           <child1><p>foo</p></child1>
                           <child2 bar='baz'/>
                         </example>"))
*SOURCE*

* (klacks:find-element *source* "child1")
:START-ELEMENT
NIL
"child1"
"child1"

* (klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))
("child1" NIL ("p" NIL "foo"))

* (klacks:find-element *source*)
:START-ELEMENT
NIL
"child2"
"child2"

*  (klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))
("child2" (("bar" "baz")))

* (klacks:find-event *source* :end-document)
:END-DOCUMENT
NIL
NIL
NIL