CL-WBXML - A WBXML parser for Common Lisp


 

Abstract

CL-WBXML is a library that can read and write WAP Binary XML (WBXML). It has been successfully used to send and receive WBXML-encoded SyncML to and from various cell phones.

The code comes with a BSD-style license so you can basically do with it whatever you want.

Download shortcut: http://weitz.de/files/cl-wbxml.tar.gz.


 

Contents

  1. Download and installation
  2. Parsing and handlers
  3. Pseudo-XMLS format
  4. The CL-WBXML dictionary
    1. parse-wbxml
    2. make-xmls-handler
    3. start-document
    4. end-document
    5. start-element
    6. end-element
    7. characters
    8. processing-instruction
    9. attribute
    10. attribute-local-name
    11. attribute-namespace-uri
    12. attribute-qname
    13. attribute-value
    14. unparse-wbxml
    15. xmls-name
    16. xmls-attributes
    17. xmls-children
    18. *extension-function*

 

Download and installation

CL-WBXML together with this documentation can be downloaded from http://weitz.de/files/cl-wbxml.tar.gz. The current version is 0.2.4.

Before you install CL-WBXML you first need to install the FLEXI-STREAMS library unless you already have it.

CL-WBXML comes with a system definition for ASDF so you can install the library with

(asdf:oos 'asdf:load-op :cl-wbxml)
if you've unpacked it in a place where ASDF can find it. Installation via asdf-install should also be possible.
 

Parsing and handlers

When parsing CL-WBXML acts like a simple SAX parser similar to CXML. The parser is called with a handler which can be any non-NIL object and it in turn calls generic functions like START-ELEMENT or PROCESSING-INSTRUCTION (which you can specialize for you handlers) while traversing the document.

CL-WBXML comes with a predefined handler class that can be used to create XMLS-like S-expressions from WBXML documents - see MAKE-XMLS-HANDLER. Furthermore, there are default handlers defined for the system class T and they all do nothing.
 

Pseudo-XMLS format

CL-WBXML uses a format to represent XML as S-expressions that is very similar to the one used by XMLS: An XML element is represented as a list with two or more members - the first one is the name, the second one is a list of attributes, the following members represent the body of the XML element and can be strings, other XML elements, or lists of octets (as a result of the parser encountering opaque data segments). Attributes are themselves lists with two members - the first one is the attribute's name, the second one is the value. Names (of elements as well as of attributes) are either strings or (for namespace-qualified names) conses where the car is the local name and the cdr is the namespace URI.

Here's an example:

<ns:foo xmlns:ns="http://weitz.de/" attr1="val1">text<bar ns:attr2="val2"> </bar>more text</ns:foo>
will be converted to this S-expression:
(("foo" . "http://weitz.de/") (("attr1" "val1"))
   "text"
   ("bar" ((("attr2" . "http://weitz.de/") "val2"))
      " ")
   "more text")
Note that this format is only similar but not identical to the XMLS format because (currently) XMLS doesn't handle namespace-qualified attribute names.
 

The CL-WBXML dictionary

CL-WBXML exports the following symbols:


[Generic function]
parse-wbxml source handler &key default-charset tag-tokens attr-tokens => result, publicid, version, charset


Reads and parses a WBXML document and invokes the methods of the handler handler accordingly. tag-tokens and attr-tokens are lists of code pages, default-charset is the one that is to be used if the document's charset isn't specified - it should be specified in a way FLEXI-STREAMS understands. Returns multiple values - the first value of the final call to END-DOCUMENT, the public ID of the document (or NIL), the WBXML version of the document (as a string), and the character set of the document.

source can be a binary/bivalent input stream, a pathname denoting an existing file, or a sequence containing octets.

The code page lists are alists where the car is the number of the code page and the cdr is itself an alist of conses mapping tokens to pseudo-XMLS names (in the case of tag tokens), strings (in the case of attribute value tokens), or lists (in the case of attribute start tokens) where the first element is the pseudo-XMLS name of the attribute and the second element is the value prefix as a string. See the file tokens.lisp for examples.

If the document has a public ID for which CL-WBXML knows the defined code pages, these will be used instead of the supplied tag-tokens and attr-tokens arguments. Currently this is the case for the following public IDs:


[Function]
make-xmls-handler => handler


This function returns a handler which can be used in conjunction with PARSE-WBXML to create pseudo-XMLS documents. Here's an example (using the second example from the WBXML spec):
CL-USER 3 > (defun create-file (&optional (file "/tmp/foo.txt"))
              (with-open-file (out file :direction :output
                                        :element-type 'octet
                                        :if-exists :supersede)
                (setq out (make-flexi-stream out :external-format :utf-8))
                (write-sequence '(1 1 #x6a #x12 #\a #\b #\c 0 #\Space 
                                  #\E #\n #\t #\e #\r #\Space #\n
                                  #\a #\m #\e #\: #\Space 0 #x47
                                  #xc5 9 #x83 0 5 1 #x88 6 #x86
                                  8 3 #\x #\y #\z 0 #x85 3 #\/ #\s
                                  0 1 #x83 4 #x86 7 #xa 3 #\N 0 1 1 1)
                                out)))
CREATE-FILE

CL-USER 4 > (defun read-file (&optional (file "/tmp/foo.txt"))
              (with-open-file (in file :element-type 'octet)
                (parse-wbxml in (make-xmls-handler)
                             :tag-tokens
                             '((0 . ((5 . "CARD")
                                     (6 . "INPUT")
                                     (7 . "XYZ")
                                     (8 . "DO"))))
                             :attr-tokens
                             '((0 . ((5 . ("STYLE" . "LIST"))
                                     (6 . ("TYPE"))
                                     (7 . ("TYPE" . "TEXT"))
                                     (8 . ("URL" . "http://"))
                                     (9 . ("NAME"))
                                     (10 . ("KEY"))
                                     (#x85 . ".org")
                                     (#x86 . "ACCEPT")))))))
READ-FILE

CL-USER 5 > (progn (create-file) (read-file))
("XYZ" NIL ("CARD" (("NAME" "abc") ("STYLE" "LIST")) ("DO" (("TYPE" "ACCEPT") ("URL" "http://xyz.org/s"))) " Enter name: " ("INPUT" (("TYPE" "TEXT") ("KEY" "N")))))
NIL
"1.1"
:UTF-8
Note that you should not re-use pseudo-XMLS handlers - create a new one for each parse.


[Generic functions]
start-document handler => whatever
end-document handler => result


These functions are called exactly once (at the start and end respectively) for each WBXML document - they are supposed to be specialized by the user. The return values of START-DOCUMENT are ignored, the first return value of END-DOCUMENT will be the first return value of PARSE-WBXML.


[Generic functions]
start-element handler namespace-uri local-name qname attributes => whatever
end-element handler namespace-uri local-name qname => whatever


These functions are called at the start and end of each XML element the parser encounters, their return values are ignored. local-name is the name of the element and namespace-uri the corresponding namespace URI (or NIL if there is no namespace). qname is the qualified name of the element but can also be NIL, if the name came from a pre-defined tag token. attributes is a list of ATTRIBUTE objects representing the element's attributes.


[Generic function]
characters handler data => whatever


This function is called whenever the parser comes across character data within the body of an XML element. data will usually be a string but it can also be a list of octets (if the OPAQUE token was encountered) or whatever *EXTENSION-FUNCTION* returns (specifically NIL for the default function). The return value of this function is ignored by the parser.


[Generic function]
processing-instruction handler target data => whatever


This generic function is called once for each processing instruction. target and data are both strings, data can also be NIL. The return value of this function is ignored by the parser.


[Standard class]
attribute


This is the class of those (opaque) objects that represent XML attributes - see START-ELEMENT. Their properties can be queried with the readers described below.


[Readers]
attribute-local-name attribute => local-name
attribute-namespace-uri attribute => namespace-uri
attribute-qname attribute => qname
attribute-value attribute => value


These generic functions can be used to read the respective properties of ATTRIBUTE objects.


[Generic function]
unparse-wbxml document target &key major-version minor-version version-string publicid force-literal-publicid prefer-inline charset tag-tokens attr-tokens if-exists => result


Encodes the XML document document (in pseudo-XMLS syntax) as WBXML and writes it to target which can be a binary/bivalent output stream, a pathname, or the symbol T in which case the output will be written to an in-memory output stream. The function usually returns NIL, but it returns a vector representing the encoded document, if target is T.

major-version and minor-version (integers) denote the WBXML version which should be used - the defaults are 1 and 3. version-string (a string) is another way to specifiy the version and if this value is not NIL the other version paramters are ignored.

publicid is the public ID (a string) of the document. If force-literal-publicid is true, the public ID is inserted as an index into the string table even if there's a well-known numeric value for it.

charset (default is :UTF8) is the character set that is to be used to encode the document. It should be a keyword that can be understood by FLEXI-STREAMS. tag-tokens and attr-tokens are lists of code pages (and they are ignored for public IDs known to CL-WBXML).

if-exists is the value used when opening a file specified by a pathname. For streams this value is ignored.

If prefer-inline is true, STR_I is used instead of STR_T whenever possible. (Some cell phones seem to have problems with string tables. Oh, well...)


[Accessors]
xmls-name element => name
(setf (xmls-name element) name)
xmls-attributes element => attributes
(setf (xmls-attributes element) attributes)
xmls-children element => children
(setf (xmls-children element) children)


These are convenience methods to access the corresponding parts of an XML element in pseudo-XMLS format.


[Special variable]
*extension-function*


The value of this variable should be a function to handle document-type-specific tokens like EXT_I_1. The function will be called with two arguments - an ID (one of 0, 1, or 2) and a value (a string, an integer, or NIL). The return value of this function is used as an argument to CHARACTERS. The default function always returns NIL.

$Header: /usr/local/cvsrep/cl-wbxml/doc/index.html,v 1.12 2006/07/25 15:09:04 edi Exp $

BACK TO MY HOMEPAGE