next up previous contents
Next: The standard matchers Up: XML Psychiatrist Previous: Installation   Contents

Basic usage

At its simplest, xml-psychiatrist just takes a nested XML-like structure and tests to make sure that the actual XML matches.

I was writing a program which takes transcripts of interviews and processes them. The interview transcripts look like this, test.xml:

<transcript firstname="Rodney" lastname="Sporklenburg">

<section id="childhood" name="My Childhood">
<topic id="infancy" name="Infants drool too much">
<para>That's the truth: infants regard drooling as almost as much fun as
pounding the remote control on the table until the batteries fall
out. It's icky.</para>
</topic>

<topic id="earlychildhood" name="Early Childhood">
<para>I thought that airplanes were birds. That creeps me out. And
cartoons creeped me out back then.</para>
<para>It occurs to me just how small tiny children are. It's always
a little scary to think that these kids could probably kill each other
with that little bouncy ball.</para>
<para>Or maybe I'm just paranoid.</para>
</topic>
</section>
</transcript>

Here is some sample code from the program, test.lisp:

(require 'asdf)
(asdf:operate 'asdf:load-op :xml-psychiatrist)
(use-package :xmls)
(use-package :xmls-utilities)    ; Comes with xml-psychiatrist
(use-package :xml-psychiatrist)

(defun sanity-check-transcript-xml (transcript)
  (multiple-value-bind (is-sane-p error-message)
      (toplevel-match (tag "transcript" ((attr "firstname")
                                         (attr "lastname"))
                           (tag+ "section" ((attr "id")
                                            (attr "name"))
                                 (tag+ "topic" ((attr "id")
                                                (attr "name"))
                                       (match-anything))))
                      transcript)
    (unless is-sane-p
      (error error-message))))

The :xmls-utilities package comes with xml-psychiatrist, and is a small collection of useful functions for dealing with the xmls XML parser. For more information about xmls, see the xmls web site.

The function sanity-check-transcript-xml takes an XML parse tree from parse-xml-file and validates it. The XML specification here is this part:

(tag "transcript" ((attr "firstname")
		   (attr "lastname"))
     (tag+ "section" ((attr "id")
		      (attr "name"))
	   (tag+ "topic" ((attr "id")
			  (attr "name"))
		 (match-anything))))

It matches a tag, ``transcript'', with attributes ``firstname'' and ``lastname'', which has one or more child tags ``section'', which have attributes ``id'' and ``name'', and one or more child tags ``topic'', etc. At the very base of the XML structure, we need to be able to match parsed character data (just ordinary strings) and arbitrary tags for formatting purposes, so we use the match-anything matcher.

Load the test.lisp file shown above and go to the REPL:

* (sanity-check-transcript-xml (parse-xml-file "test.xml"))

NIL

Not very exciting, is it? That's because the XML validation was sucessful and the document was shown to be sane. Naturally, the XML validator just did its checking quietly and without signalling an error.

The function parse-xml-file is in the :xmls-utilities package. It takes a filename and returns the parsed representation of an XML file.

Let's take a closer look at the machinery:

* (tag "transcript" ((attr "firstname")
		   (attr "lastname"))
     (tag+ "section" ((attr "id")
		      (attr "name"))
	   (tag+ "topic" ((attr "id")
			  (attr "name"))
		 (match-anything))))

#<XML-PSYCHIATRIST::MATCHER {9B96129}>

The tag macro and the attr function both return objects called matchers. A matcher looks at a node in the XML parse tree and tries to match it based on certain criteria. For example, the matcher (attr "firstname") matches an attribute named ``firstname''. We can add further constraints. The matcher (attr "age" :type 'integer) matches an attribute named ``age'' which, when read by the Lisp reader, has the type 'integer. Let's try it out:

* (toplevel-match (attr "age" :type 'integer) '("age" "24"))

0
NIL
* (toplevel-match (attr "age" :type 'integer) '("age" "blue"))

NIL
"Matcher #<MATCHER {9B5C889}> did not match"

The toplevel-match function takes as arguments a matcher and a node in xmls parse-tree format. It returns two results, matched-p and error-message. matched-p is whether or not the matcher matched, and if it isn't nil, that indicates that the matcher did indeed match. If it is nil, then error-message becomes important. error-message will contain a string explaining why the matcher didn't match.

In the example above, the matcher matched the first node because it was an ``age'' attribute whose value was an integer. However, the matcher did not match the second node, because ``blue'' is not an integer. The Common Lisp type system allows for even more impressive feats:

* (toplevel-match (attr "age" :type '(integer 0 170)) '("age" "12"))

0
NIL
* (toplevel-match (attr "age" :type '(integer 0 170)) '("age" "500"))

NIL
"Matcher #<MATCHER {9B5C889}> did not match"

In this example, we constrain the type of the ``age'' attribute to be '(integer 0 170), an integer between 0 and 170. This makes sense, because nobody can have a negative age, and it is very unlikely that anybody will live to be older than 170 years. For more information on the specifics of the Common Lisp type system, see the HyperSpec.


next up previous contents
Next: The standard matchers Up: XML Psychiatrist Previous: Installation   Contents
root 2004-10-26