README.quasiquote

fare-matcher friendly implementation of Quasiquote
Copyright (c) 2002-2010 Fahree Reedaw <fare@tunes.org>


PURPOSE

The main purpose of this n+2nd reimplementation of quasiquote,
is enable matching of quasiquoted patterns with fare-matcher.

Now, developing this implementation is also a challenge in understanding
the ins and outs of quasiquotation, and in exploring the way it interacts
with extended support for constant-building syntactic constructs.
And this, at least to me is as much fun as it is an intellectual challenge. :-)


INTERACTION WITH FARE-MATCHER

This implementation of quasiquote requires the FARE-MATCHER system.

It enables quasiquotation inside FARE-MATCHER patterns.
However, as a limitation to such use, note that as long as
the pattern-matcher isn't extended to match APPEND patterns,
which typically requires backtracking, then those quasiquote templates
that expand into something using APPEND can't be used as patterns to match.
This mostly restricts the use of ,@ or ,. to the end of a list.

This implementation also uses FARE-MATCHER internally,
which allows to tremendously simplify the simplifier
used to normalize quasiquote-using expressions.
This simplifier in turn is necessary to simplify APPEND patterns
into CONS LIST* and LIST patterns for use with FARE-MATCHER.


REFERENCES

Essential documents consulted while implementing this file:
Alan Bawden's paper: http://www.bawden.org/ftp/pepm99.ps.gz
CLtL2: http://www.supelec.fr/docs/cltl/clm/node367.html
CLHS: http://www.lisp.org/HyperSpec/Body/sec_2-4-6.html
Slate reference manual section 2.6.2 on quoting and unquoting:
    http://slate.tunes.org/doc/progman/node12.html#SECTION00046200000000000000
Common Lisp backquote implementation, written in Common Lisp. (public domain)
  Author: Guy L. Steele Jr.     Date: 27 December 1985
  To be used with patch by Alex Plotnick 2010 regarding the simplification pass.
SBCL backquote implementation (derived from CMUCL).


NOTES

* We provide for a read-time quasiquote expander as well as
 a macro-expansion-time quasiquote expander.
 The read-time expander handles #() syntax correctly
 but mightn't be as efficient as the built-in reader in the common case,
 whereas the macro-expansion-time sees #() too late (see BUGS).
 To use the macro-expansion-time one, uncomment the line that declares
 feature :quasiquote-at-macro-expansion-time.


TO DO

* The CLHS doesn't tell anything about multi-dimensional arrays;
 however, this is the opportunity to define a MUP: Meta Unquote Protocol.
 The MUP would allow to extend the quasiquote mechanism with support
 for new constant-building syntactic constructs as such constructs are defined.
 Maybe we will end up with a full-fledge declarative infrastructure
 for a Parser-Preprocessor-Pretty-Printer, like camlp4 only more declarative.

* Can we improve performance?
 Statically compile and optimize the patterns for the simplifier?

* Implement the MUP with an abstract API for new readers;
 as examples, provide readers for existing syntax that can be toggled
 as either unquote-aware or not: #(), #A, etc.

* Implement tagged (and multiple-valued?) quasiquotes, unquotes and quotes.


BUGS:

* It looks like there are nasty bugs in the simplifier.
 Maybe start again afresh with a MUP for CONS?
 We should start from the known-working algorithm described in CLtL2,
 that keeps the data in simplified form at all time incrementally,
 instead of trying a global simplification after-the-fact.

* This version works inside simple vectors, so as to support
 unquoting inside the #(...) syntax as the standard mandates.
 However, doing that at macro-expansion time means that we disturb
 any SIMPLE-VECTOR that appears in the source code, even if it comes
 from forms other than #(...), such as #1A(...) or #.(VECTOR ...).
 This phenomenon has been documented before in the following message:
	http://groups.google.com/groups?q=author:kaz%40ashi.footprints.net&hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&selm=cf333042.0303141409.bbf02e9%40posting.google.com&rnum=4

* Interestingly, I haven't seen the following problem stated, to know which is
 correct of `#2(1 ,@'(2 3)) or `#3(1 ,@'(2 3)). In other words, does the read
 argument to #() mean something at read-time or at vector-creation-time?
 Of course, in the original intended usage, outside of any quasiquoting,
 read-time and vector-creation-time are one and the same.
 But what when quasiquote breaks up that identity?

* Note that we do not handle unquoting inside structures
 or arrays more complex than simple vectors.

* Note that in the case of the macro-expansion-time expander,
 we do not check for lone unquote syntax before macro expansion,
 which may lead this implementation to accept hacks
 that are not valid according to the standard, such as
 macros that produce unbalanced unquote syntax markers.

* Certainly, the implementation of quasiquote should be "opened"
 so as to allow new syntactic features to take advantage of it.
 Then, maybe we can redefine our own proper reader behaviour for #(...)
 and each MUP-supporting piece of syntax, on top of the MUP?

 Note that copying and modifying read-tables is expensive, that dynamically
 modifying and restoring read-tables might interfere with #. syntax, and that
 caching modified read-tables will interfere with any subsequent modification
 of a cached read-table, comparison not being possible.
 This means that if we wanted the MUP to adapt to existing extensions
 without modifying existing code, we would have to intercept the definition
 of syntax reading functions before they are fed to either SET-MACRO-CHARACTER
 or SET-DISPATCH-MACRO-CHARACTER. Spooky.
 Now, this also requires that the current depth of quasiquoting be consulted
 any time any of the MUP-enabled constructors is read.

* The principle of the MUP is that
 = structure readers that don't want to support unquote MUST be wrapped into
  something that dynamically binds *quasiquote-level* to 0.
 = structure readers #C(ARGSYNTAX) that do want to support unquote
  MUST accumulate formal arguments to a structure constructor
  into a list ARGUMENTS, then, if *quasiquote-level* is 0, behave like
  #.(apply `#CONSTRUCTOR `#ARGUMENTS)
  otherwise, behave like
  ,(apply `#CONSTRUCTOR `#ARGUMENTS)
  where #CONSTRUCTOR is the name of the constructor for given structure,
  and #ARGUMENTS is whichever arguments have been deduced from the syntax,
  which may include as many levels of unquotations as *quasiquote-level* says.
  Note that in a strong sense, "#." is like "," assuming an infinite tower of
  read-time evaluators a la 3-LISP.

* Note that the above is obscured because we're trying to discuss
 the behaviour of quasiquote-containing programs without having
 a meta-level quasiquoting facility that could distinguish
 between what is constant or variable at the meta-level independently from
  what is constant or variable at the base level:
 #CONSTRUCTOR and #ARGUMENTS would be better specified
 through a special meta-level unquotation, the above expressions
 being in a corresponding special quasiquotation.
 A feature that would allow for clear separation of levels of meta-language
 would be a tagged quasiquote feature, as in Slate (http://slate-language.org/).

* This version is not safe with respect to quasiquoting expressions
 where appear some marker symbols interned in the FARE-QUASIQUOTE package --
 such expressions may lead to confusion between the body of expressions
 being quasiquoted and the internal quasiquote infrastructure.
 Any objects used as markers instead of these interned symbols
 would lead to the same "bug" if somehow used inside quasiquoted expressions;
 but at least, non-interned symbols or gensyms would allow to avoid this bug
 in expressions being READ. The "perfect" solution would be to use
 objects GENSYMed at the moment a given QUASIQUOTE is read,
 and kept in a lexical variable to prevent introspective cheat by #. syntax.
 Then, after simplifying expressions, we could pass the read expression
 through functions that would SUBSTitute proper symbols for the markers,
 from the quasiquote package if pretty-printing is to be supported,
 or otherwise from the CL package.
 Note that this would have to be interleaved with support for working
 inside vectors and other syntax. This is all very tricky
 and until further notice is left as an exercise to the intrepid reader.
 Thus, while the behaviour of our implementation isn't strictly correct,
 we don't go through the hassle of modifying it into something
 much less readable just for the sake of preventing code that would
 deliberately confuse the quasiquote engine.
 Now, if we imagine that some code were dynamically generated
 based on system introspection, that could contain some of our magic markers,
 then this code would have to be made safe for interaction with QUASIQUOTE;
 this might (or might not?) require making QUASIQUOTE 100% fool-proof.

* This implementation simplifies quasiquoting of literals
 and other self-evaluating objects into the object itself,
 instead of a `(QUOTE ,object) expression.
 This is also the behaviour of the simplifier in CMUCL and SBCL,
 but some people have expressed concerns that it might not be
 strictly allowed in letter or spirit by the Common Lisp standards.
 We believe that our behaviour is correct in both letter and spirit;
 but in any case, a more literally correct (pun intended) behaviour
 with respect to the CL standards can be achieved by defining
 the feature QUASIQUOTE-QUOTES-LITERALS (untested).

* The idea of making circular data-structures work within quasiquotation
 makes my head ache with overarching pain. I make no attempt to try that,
 and most conspicuously not in the simplifier. You're crazier than I am
 if you do it and do it right.

PS: If you're able to follow this discussion, you impress me.
Come join the TUNES project!