More events (2/3)

We've already met sax:start/end-document, sax:start/end-element, and sax:characters.

Other important events are:

SAX:COMMENT (handler comment) generic function
SAX:PROCESSING-INSTRUCTION (handler target data) generic function

When processing characters, the following are important in addition to sax:characters:

SAX:UNESCAPED (handler string) generic function
SAX:START-CDATA (handler) generic function
SAX:END-CDATA (handler) generic function

sax:unescaped is an odd one -- the parser will never emit it. It is used for strings that are to be put into a document literally, i.e. without any escaping. Be careful when using it. Unfortunately, some applications (e.g. XSLT) require this sort of feature, and both SAX and STP are built to handle it gracefully.

sax:unescaped is also useful to sneak in formatting that cxml would otherwise normalize away, for example entity references. These are not part of the content model, and not supported explicitly by STP either; the parser resolves references by inserting the actual entity instead. But by sending an unescaped event with the string "ä" though, you can insert the aauml entity reference. Equivalently, you could use the character reference "ä". But notice that you might as well just have used sax:characters with the string "ä", since XML requires Unicode support anyway! My recommendation is to use sax:unescaped sparingly if at all.

sax:start/end-cdata do not carry any content themselves; they wrap around sax:characters events, designating them as resulting from a CDATA section. You can usually ignore these safely, since CDATA sections do not affect the content model of an XML document.

Finally, the following events get emitted around elements:

SAX:START-PREFIX-MAPPING (handler prefix uri) generic function
SAX:END-PREFIX-MAPPING (handler prefix) generic function

These provide namespace information. For any element and its attributes emitted between a matching set of start/end-prefix-mapping, the namespace prefix referred to by them is bound to the specified uri.

cxml also leaves the xmlns attributes in the start-element tag, but the mapping events provide that information in a structured form.