Printing XHTMΛ

I finally bit the bullet! I swear I tried! I pestered people! I failed. I.e., I failed to like any of the "major" HTML-producing libraries "out there" and so I rolled up my own. In the process, I (re)learned a few things and I believe I made good use of parts of the language that are usually overlooked.

My problems with the other libraries

This section must start with an apology.

All the libraries I tried, are very fine and sophisticated pieces of software that do solve problems. Alas, myself being a rotten Lisper, I found that I "needed something different" (read: "something I wrote"). Therefore, the comments you'll read below are not to be intended as general statements about such libraries, but only as testimony of my whims.

The libraires I looked at are CL-HTTP, CL-WHO and variations of TFEB's htout and Franz htmlgen, especially in the XHTML-GENERATOR version that comes with CXML.

As I said, my idiosyncrasies with the the whole business of CL programming found problems with each of these otherwise fine libraries. More specifically, I found CL-HTTP too heavy to use just to generate HTML. One gripe I had with CL-WHO is that it did not handle pretty printing of HTML well (indentation is off in "recursive" use); more or less the same can be said of htout and htmlgen. CXML XHTML-GENERATOR is essentially a "round-trip" utilities and it makes your life quite unhappy if you are trying to use simple HTML entities like - surprise - λ and Λ.

CL-WHO, htout, htmlgen and XHTML-GENERATOR all take the approach summarized as I will compile a SExp representing "HTML" and will generate - in line - a set of specialized writing calls (yes: mostly WRITE and WRITE-STRING). (Cfr. the examples in CL-WHO documentation.)

There is nothing wrong with this approach, but it makes the resulting library and overall implementation more monolithic and it does not leverage some of the bells and whistles that you have available in CL. Thus I rolled my own (and I called it XHTMΛ).

Yet anothern HTML generation library

My approach to HTML (or XML) generation is the following:

  1. HTML (or XML!) element need not be "lists" or "conses"; they can be bona-fide objects, i.e., structures.
  2. print-object, and, above all, the pretty printer are my friends.
  3. *print-pretty*, *print-readably* etc., are more than useful.

There are a few consequences from this choices and they should be exposed. Before doing that, let's see what happens in the basic case.

The basic definition in the implementation of XHTMΛ is the representation of a HTML (or XML!) "element". It is very simple and it does accommodate the HTML5 bits and pieces.

(defstruct (element (:constructor %element))
   (tag        nil :type symbol)
   (attributes ()  :type list)
   (content    ()  :type list))

tag is ... the tag, attributes is a p-list and content is a possibly empty list of other elements.

"Printing" an element

Let's forget a minute about the constructor and let's instead concentrate on an element "printing" process. The main entry point is a print-object method.

(defmethod print-object ((e element) (s stream))
  (let ((tag (element-tag e))
        (attributes (element-attributes e))
        (content (element-content e))
        )
    (cond (*print-pretty*
           (pprint-xhtml s e))

          (*print-readably*
           (format s "#S(~S :TAG ~S :ATTRIBUTES ~S :CONTENT ~S)"
                   (type-of e)
                   tag
                   attributes
                   content))

          (t
           ;; Format string showing-off!!!!
           (format s "<~A~{ ~A=\"~A\"~}~:[ />~;>~:*~{~S~^ ~}</~3:*~A>~]"
                   (string-downcase tag)
                   attributes
                   content
                   )
           ))
    ))

The method is rather straightforward (apart from the last format string, which does many things at once: (1) writes the attributes, (2) checks whether there is content and if not closes the tag, otherwise backs up to print it, and (3) finally it backs up again to the tag to print the proper closing element). Note that, in order to properly and nicely printing the element, if *print-pretty* is non-NIL, then the function pprint-xhtml is called.

Using the pretty printer

It may be just me, but I believe that the pretty printer is an under-used part of the CL standard. Therefore, I set out to use it heavily in order to get "properly indented" (meaning, the way I like it) (X)HTML. The function pprint-xhtml does that.

(defun pprint-xhtml (s xhtml-element)
  (declare (type stream s)
           (type element xhtml-element))
  (let ((tag (string-downcase (element-tag xhtml-element)))
        (attrs (element-attributes xhtml-element))
        (content (element-content xhtml-element))
        )
    (pprint-logical-block (s content)  ; (1)
      (pprint-logical-block (s content)  ; (2)
        (format s "<~A~@<~{~^ ~A=\"~S\"~^~_~}~:>" tag attrs)  ; (3)
      
        (when content
          (write-char #\> s)
          (pprint-newline :mandatory s)
          (format s "~{~4,0:T ~:W~_~}" content)
          ))

      (if content
          (format s "~0I</~A>" tag)
          (format s " />"))
      )))

The function requires a few explanations (of course, if you are a "pretty printer black-belt" this may be a bit boring). First of all, a display of what I want to obtain.

<body style="color: red">
    <p>
        Some text here
        <ul>
            <li>
                Line 1
            </li>
        </ul>
    </p>
</body>

This indentation may not be the best possible and there are some pitfalls, but it is better than what you get with the other libraries. But how does the function pprint-xhtml achieve this result while interacting with the pretty printing machinery?

The function pprint-xhtml uses three logical blocks. Two for the element and a third for the attributes. The logical block for the attributes is introduced in the format string using the ~@< ... ~:> directive. Note also the conditional newline ~_ in the list iteration construction ~{ ... ~}. The other two pprint-logical-block establish the fence for the whole element and for the "inside" of the same. The outer pprint-logical-block serves essentially to print the closing tag (if needed) correctly indented. The "inner" pprint-logical-block just serves to provide the correct indentation for the tag and the actual element content. The pprint-newline and the indentation directive in the format string, do the rest.

Once you wrap your head around it (it did take me some time!) it is very straightforward, and very powerful.

Bells and Whistles

The pretty printing machinery offers you more control over what you can do with it. For the time being my code just uses one simple hook into the pretty printer dispatch table in order to write strings "unquoted", but, potentially, this is the machine to provide fancier element layout.

The actual "printing" of an element is controlled by a specialized macro (provisionally) called with-html-syntax which calls write with an appropriately setup :pprint-dispatch argument.

The variable *xhtml-pd* holds the modified pretty print dispatch table, which it is initialized as follows (at a minimum):

(set-pprint-dispatch 'element
                     'pprint-xhtml
                     0
                     *xhtml-pd*)

(set-pprint-dispatch 'string
                     (lambda (s xhtml-string)
                        (write-string xhtml-string s))
                     0
                     *xhtml-pd*)

This is the result:

XHTMLAMBDA 29 > (with-html-syntax-output (*standard-output* :syntax :standard :print-pretty t)
                  (body (:style "color: red")
                        (p ()
                           "Some text here"
                           (ul ()
                               (li () "Line 1")))))
<body style="color: red">
    <p>
        Some text here
        <ul>
            <li>
                Line 1
            </li>
        </ul>
    </p>
</body>
<body style="color: red"><p>"Some text here"<ul>< ... </body>  ; This is the value returned!

XHTMLAMBDA 30 >

XHTMΛ Syntax

As you have noted in the previous example, the syntax of XHTMΛ elements is

   (tag attributes . content)

where each tag is implemented as a macro, which is essentially in charge of delaying the evaluation of the content plus some other massaging, mostly flattening of the content lists, this is achieved by having each macro calling a first parsing step, which generates an "intermediate" form that eventually calls the element function (see below). The following example shows a pretty standard trick:

XHTMLAMBDA 33 > (with-html-syntax-output (*standard-output* :syntax :standard :print-pretty t)
                  (body (:style "color: red")
                        (p ()
                           "Some text here"
                           (ul () (loop for i below 5
                                        collect (li () (format nil "Line ~D" i)))))))
<body style="color: red">
    <p>
        Some text here
        <ul>
             <li>
                 Line 0
             </li>
             <li>
                 Line 1
             </li>
             <li>
                 Line 2
             </li>
             <li>
                 Line 3
             </li>
             <li>
                 Line 4
             </li>
        </ul>
    </p>
</body>
<body style="color: red"><p>"Some text here" ... "Line 4"</li></ul></p></body>

XHTMLAMBDA 34 >

Thus XHTMΛ is unlike most other libraries which just discriminate on the first element of a SExp, usually a keyword. XHTMΛ wants more structure and it strives to be more easily extensible through "standard" and low-level machinery, cfr., the pretty printing machinery and CLOS. As an aside, the %element constructor is there just to be called by a "factory" generic function called - you guessed it - element.

Other Syntaxes and the HTMLISE Macro

Yet there is value in the widely used alternative SExp syntax for HTML (and XML):

  (tag . content)
or

  ((tag . attributes) . content)

In order to accommodate such syntax (and also a "keyword-based" one), XHTMΛ provides a htmlise macro which does some more rewriting from the syntax just above (termed :compact) to the "operator-and-attributes" syntax (termed :standard).

XHTMLAMBDA 39 > (htmlise (:syntax :compact)
                  ((body :style "color: red")
                   (p "Some text here"
                      (ul (loop for i below 5
                                    collect (li () (format nil "Line ~D" i))))))
                  )
<body style="color: red"><p>"Some text here" ... <li>"Line 4"</li></ul></p></body>

A few syntactic quirks of the current implementation

The syntax rules are somewhat more convoluted because of a few "conveniences" (which may disappear in a future). Essentially there are two such things to keep in mind.

Tags without "content" have a simplified syntax

Either in the :standard or :compact tags with no content are better be left alone and written as:

   (tag . attributes)

An example is the img tag, which always gets written as in the following example, no matter what syntax is specified:

   (img :src "foo.jpg")

Other tags that have this behavior are:

with-html-syntax-output and htmlise do not employ a code walker

Consider the previous example:

XHTMLAMBDA 39 > (htmlise (:syntax :compact)
                 ((body :style "color: red")
                  (p "Some text here"
                     (ul (loop for i below 5
                                  collect (li () (format nil "Line ~D" i))))))
                 )
<body style="color: red"><p>"Some text here" ... <li>"Line 4"</li></ul></p></body>

As you can see, the li does not "inherit" the syntax specified by the outer htmlise. This is because the element is enclosed in an inner form (in this case a loop). I.e., the macros with-html-syntax-output and htmlise do not walk past regular function and macro calls. This may change in the future, Alas, for the time being you either have to use the :standard syntax for your inner elements, or you must wrap each in an explicit htmlise.

Availability

The XHTMΛ library will be available "very soon" in common-lisp.net.

A preliminary git repository is available for browsing. The preliminary git repository for "fetching" can be found at:

The preliminary git repository for "pushing" can be found at:

References

[W93] Richard C. Waters, Some Useful Lisp Algorithms: Part 2, Mitsubishi Electric Research Laboratories Technical Report 93-17, August, 1993.

[S90] Guy L. Steele Jr., Common Lisp, the Language, 2nd Edition, Digital Press, 1990.