This library is based on the code I used while writing Practical Common Lisp to generate both PDFs (which I used for my own proofreading purposes) and the HTML version of the book that is available on the web at http://www.gigamonkeys.com/book/.
I had two goals:
In the end I settled on a mix of TeX-style markup (such as \i{...} to
indicate italic) and various kinds of Wiki-inspired markup such as
indicating new paragraphs by separating them with blank lines and
inferring the style of certain paragraphs from their indentation. And
because part of the point was to be able to write in Emacs I decided
to make my markup language grok Emacs's outline-mode headers, lines
prefixed by a number of *
's. I found this to be a pleasing
format for writing
The code here is a somewhat cleaned up version of the code I actually
used while working on the book. The markup language itself is
implemented in parser.lisp
. The function PARSE-FILE
takes a pathname designator and returns a list of s-expressions
representing the parse tree. For instance if the file
sample.txt
contained the following:
* A Sample Here is a paragraph. Here's a sentence with some \i{formatting} in it. And another sentence. And this is another paragraph.
then (parse-file "sample.txt")
would return this:
((:H1 "A Sample") (:P "Here is a paragraph. Here's a sentence with some " (:I "formatting") " in" #\Newline "it. And another sentence.") (:P "And this is another paragraph." #\Newline))
The :H1
tag is derived from the number of asterisks leading the
line "A Sample"
; the :P
tags are inferred from the lack of
any other tag; and the :I
tag is taken from the name between
the \
and the {
.
For the most part, the parser doesn't care about the names used as
tag names in the \name{...
} syntax. However it does need to
know whether a given tag name is the name of paragraph tag (which can
appear where an untagged paragraph can) or a subdocument tag (which is
parsed like the document as a whole and thus can contain
paragraphs). It recognizes tags as being one of these two types using
the variables *PARAGRAPH-TAGS*
and *SUBDOCUMENT-TAGS*
which are defined in parameters.lisp
and can be bound or set as
you see fit.1
The files pdf.lisp
and html.lisp
each provide
implementations of the generic function RENDER-AS
which takes
an argument indicating the type of output, (either :pdf
or
:html
), an s-expression as returned by PARSE-FILE
, and
the name of the output file. The public interface to RENDER-AS
is the function RENDER
which takes a an output type indicator,
the name of an input file, and, optionally, the name of an output file
which is otherwise derived from the name of the input file with the
the output type as the file extension. Thus this documentation which
is in the file docs.txt
is rendered into PDF as:
MARKUP> (render :pdf "docs.txt") #P"/Users/peter/projects/lisp-libraries/development/markup/docs.pdf"
New output formats can be defined simply by defining a new method on
RENDER-AS
specialized on a different output type.
Styles The current PDF rendering hard-wires assumptions about how to render different document elements. And the HTML rendering simply provides a way to map from document elements to HTML tags (possibly with attributes), defaulting to mapping to the HTML tag of the same name. Ideally there'd be a way to express style information and define a mapping from the markup elements to the rendering style without changing the code. Even more ideally it'd be possible to reuse that same style information with different back-ends. Which I have some ideas about but which I've not done anything about yet.
Footnotes In both the PDF and HTML renderer, footnotes are generated as end-notes. This is probably the best you can do for HTML but in the PDF renderer it'd be nice to be able to place notes as true footnotes. However this is a non-trivial problem because making room for a footnote at the bottom of a page affects what body text fits on the page which can affect what footnotes should appear on that page. In the worst case scenario putting a footnote on a page causes the text containing the footnote reference to no longer fit on the page meaning the footnote shouldn't go on that page after all. So you take it off but now the footnote reference is back on the page. And so on.
Other output formats When working on the book I implemented an incredibly hacky RTF generator that allowed me to produce RTF documents with all the Apress styles appropriately applied since I ultimately had to turn in Word files to Apress. I'd like to clean that up and add it back in in some form because there's nothing better than not having to use Microsoft Word.
In theory all you need to do to use this library is grab the Monkeylib
distribution of which it is part, unpack it, and arrange for ASDF to
know about the various .asd
files, either by adding a
appropriate symlinks in a directory that is already in
asdf:*central-registry*
or by adding the monkeylib directories
containing .asd
files to asdf:*central-registry*
. The
easiest way to do this is to LOAD
the file load.lisp
in
the top-level Monkeylib directory. You'll also need Marc Battyani's
excellent CL-PDF and CL-TYPESETTING libraries installed where ASDF can
find them. Here are the URLS for
Monkeylib | http://www.gigamonkeys.com/lisp/monkeylib-0.3.tar.gz |
CL-PDF | http://www.fractalconcept.com/asp/html/cl-pdf.html |
CL-TYPESETTING | http://www.fractalconcept.com/asp/html/cl-typesetting.html |
1. In general subdocument tags are used for two kinds of things.
The first are tags used to delimit content that can be embedded in a paragraph but which may need to themselves consist of multiple paragraphs. Footnotes such as this one are a prime example.
The other kind of subdocument elements are structural elements such as
lists and tables that consists entirely of other tagged elements. If
you want to use the tags \bullets{...}
and \item{...}
to
indicate a bulleted list, you should add :bullets
to
*SUBDOCUMENT-TAGS*
and :item
to *PARAGRAPH-TAGS*
.