CL Extensions: UNIFICATION
 

Common Lisp Extensions: UNIFICATION

The notion of unification originated in the field of formal logic (e.g. [R65],) and has been used extensively in Computer Science and Programming Languages. Most notably, Prolog uses the full power of unification.

Unification is also at the core of type checking algorithms in the tradition of Milner's, and a limited form - pattern matching - is available to the user in languages of the ML and Haskell family.

The library presented in these pages provides a full blown unification framework for Common Lisp.

Writing a pattern matcher or a an unifier in Common Lisp is easy, as long as we limit ourselves to manipulate only ATOMs and CONSes.

Alas, it would be much nicer if we could manipulate arbitrary Common Lisp objects as the ML programmer can with arbitrary ML objects.

The library presented here is the first one (to the best of the author's knowledge) that is capable of manipulating arbitrary Common Lisp objects.

The hope is that this library could be incorporated directly in an implementation in order to provide better type checking.

This should not come as a surprise, as a compiler like CMUCL does include a type inference engine, which does very similar things.

Unification Basics

The unification process makes sure that two object descriptions containing some holes - i.e. variables - can be made equal (almost in the EQUALP sense) by assigning consistent values to the variables involved.

Suppose we had a function U performing unification and returning a set of variable assignments, often called a substitution. A very simple example involving the unification of two numbers could be

  U(42, 42) ==> {}
  
The two numbers are EQL, so no variable is involved and the empty substitution is returned.
  U(42, 123) ==> <unification failure>
  
The two numbers are not EQL, so the unification fails.
  U(42, x) ==> {x -> 42}
  
The only way to make the unification process to succeed is to bind the value 42 to the variable x.

The UNIFICATION library defines all the necessary functions and a unification sub-language to handle most of Common Lisp.

UNIFICATION Library

The UNIFICATION library has one main entry point, the generic function UNIFY, and a sub-language definition that allows us to talk about Common Lisp objects.

The UNIFY generic function has the following signature:

  unify x y &optional substitution
  
Where x and y are either arbitrary Common Lisp objects, variables, or object templates. These items constitute the so-called extended terms manipulated by the unification machinery.

Variables are symbols with a #\? as the first character of the name. This is a rather traditional choice, although a different one based on quoted symbols is possible. Therefore, the following are examples of variables.

  ?A ?s ?qwe ?42z ?a-variable-with-a-very-long-name ?_
  
There are two special variables, ?_ and _, which are used as anonymous place holders, they match anything, but never appear in a substitution.

Hence, the above examples result in the following

  cl-prompt> (unify 42 42)
  #<EMPTY ENVIRONMENT xxxxxx>
  
  cl-prompt> (unify 42 123)
  ==> error: unification failure
  
  cl-prompt> (unify 42 ?x)
  #<ENVIRONMENT xxxxxx>

  cl-prompt> (find-variable-value '?x *)
  42
  

Where FIND-VARIABLE-VALUE is the accessor used to find the value of a variable in a substitution.

As a more complicated example, consider the usual CONS based unification

  cl-prompt> (unify '(foo (bar frobboz) ?baz) '(foo ?gnao 42))
  #<ENVIRONMENT xxxxxx>

  cl-prompt> (find-variable-value '?gnao *)
  (BAR FROBBOZ)

  cl-prompt> (find-variable-value '?baz **)
  42
  

Of course note the following behavior

  cl-prompt> (unify '(foo ?x 42) '(foo 42 ?x))
  #<ENVIRONMENT xxxxxx>

  cl-prompt> (unify '(foo ?x 42) '(foo baz ?x))
  ==> error: unification failure
  

UNIFY works also on arrays and vectors. Strings are treated as atomic objects

  cl-prompt> (unify #(1 2 3) #(1 ?x ?y))
  #<ENVIRONMENT xxxxxx>

  cl-prompt> (find-variable-value '?x *)
  2
  
  cl-prompt> (unify #2A((1 2 3) (a s ?z)) #2A((1 ?x ?y) (a s d)))
  #<ENVIRONMENT xxxxxx>

  cl-prompt> (find-variable-value '?z *)
  D
  

So far so good, but how can you unify two structures? First of all there is no portable way (yet) to list all the slots of a given structure. Secondly, by allowing the unification of arbitrary CONSes, we have created a short-circuit in the unification machinery.

Dealing with object instances raises similar problems.

Suppose we have the following definition

  (defstruct foo a s d)
  
The straightforward
  (unify #S(FOO A 42 S NIL D NIL) (make-foo :a 42))
  
cannot be built portably, besides, we do not even have the equivalent of the #S(...) notation for regular CLOS objects. Moreover we want to do other things with other data types.

A simple solution is to define a template sub-language to express unifications between structure objects or standard objects (and other data types as well.) Nevertheless, the definition of such sub-language cannot be CONS based, because in such case, we would conflate arbitrary CONSes and the expressions of the sub-language.

To circumvent this problem we resort to the usual trick a ML programmer uses to placate the type-checker: we introduce an "intermediate" data type. Essentially the following

  (defclass template ()
     ((spec :accessor template-spec :type (or cons symbol number) ...)))
  

The TEMPLATE class is accompanied by a reader macro (#T for template, or type, with an abuse of language) and an appropriate PRINT-OBJECT method. The #T reader macro expands as

  #Tsomething ==> (make-instance 'template :spec something)
  
With this infrastructure we can express the unification of the FOO instance as
  (unify #S(FOO A 42 S NIL D NIL) #T(foo foo-a 42))
  
I.e. we use the actual structure accessor FOO-A to get to the the value of the slot A in a FOO. This is an example of the template language.

A more interesting example, which involves vectors is the following

  cl-prompt> (unify #(1 2 3 4 5) #T(vector 1 ?x &rest ?rest)) ; You get the idea...
  #<ENVIRONMENT xxxxxxx>

  cl-prompt> (find-variable-value '?rest *)
  #(3 4 5)
  
I.e. we have a DESTRUCTURING-BIND on steroids.

Note that separating the templates is necessary if we want to do something like

  cl-prompt> (unify '(1 2 3 4 5) #T(list 1 ?x &rest ?rest))
  #<ENVIRONMENT xxxxxxx>

  cl-prompt> (find-variable-value '?rest *)
  (3 4 5)
  
Without the template denoted by #T(list ...) the unifier would have been utterly confused.

In the following the full extent of the UNIFICATION facility is described in its main components.

Unifying Substitutions

The Template Sub-language

Control Flow

The UNIFICATION Dictionary

References

[R56] J. A. Robinson, A machine-oriented logic based on the resolution principle, Journal of the ACM, Vol. 12, No. 1, January 1965, Pages 23--49.


DISCLAIMER: The code associated to these documents is not completely tested and it is bound to contain errors and omissions. This documentation may contain errors and omissions as well.

The file COPYING contains a Berkeley-style license. You are advised to use the code at your own risk. No warranty whatsoever is provided, the author will not be held responsible for any effect generated by your use of the library, and you can put here the scarier extra disclaimer you can think of.


 

News

News in chronological order, most recent on top.

  • 2011-02-20
    CL-UNIFICATION is now in Quicklisp.