10. Symbols
Common LISP is known for its ability to process symbolic information. The data type which embodies this is called a symbol.
The following functions are used to manipulate symbols:
copy-symbol
get
getf
make-symbol
remf
remprop
symbol-name
symbol-plist
symbol-package
gensym
gentemp
keywordp
intern
This chapter contains the following sections:
10.1 Symbol Print Names
10.2 Property Lists
10.3 Symbol Package
10.4 Other Symbol Slots
10.5 Lexvars and functional symbols
A Common LISP symbol is a data object that roughly corresponds tothe concept of a variable in other programming languages. That is, writing the name of a symbol serves as a way to access a fixed memory location which is assigned by the compiler; this location is used to store a value, which is what the symbol evaluates to in an expression.
In Common LISP, a symbol can do much more than simply naming a memory location. LISP symbols are rich in functionality, possessing a value, a list of properties, a package, and other attributes. LISP symbols can have arbitrarily elaborate names. This befits a language known for symbolic processing.
It should be kept in mind that not all symbols will get translated into a symbol in the sense described here: if a symbol is a lexical variable it is translated by the compiler into an internal object which describes an offset in the stack; this is much closer to the sense of a variable in other programming languages. Other symbols, those read at the car of a list with functional significance, will likewise be converted into an internal object which describes an offset in a table of functional objects. For information, see lexvar and functional symbol.
In this document, symbols are described as if they were CLOS objects using the standard CLOS terminology. This is not to be taken as an indication that they can be directly used as objects in the strict sense of CLOS, rather as a translation into CLOS terms. Implementationally, a Star Sapphire LISP symbol object is a C structure which has a number of fields; each field will be called a slot in this document; in the Common LISP specification such slots were called 'cells'.
One distinct symbol - nil - appears by magic at the creation of the LISP world. Because of its semantics, the symbol nil must be created before much of the machinery associated with packages, etc. can come into existence, even though the act of creating a symbol requires this machinery. Although nil has much of the outward characteristics of a symbol (and is embodied in the same data structure as symbols), it has its own type which inherits from the list class so that it can stand for the empty list. This inheritance from both symbol and list types has been hardwired in the class and type hierarchy.
10.1 Symbol Print Names
In most other languages, a variable name starts with an alphabetic character or possibly an underscore, followed by a sequence of alphanumerics and underscores.
In Common LISP, a symbol is essentially any such sequence which does not translate into a number; i.e. a symbol can start with a digit, as long as it does not otherwise scan as a valid number.
Various other characters which are not usually found in variable names can also be incoporated into symbols: in particular the asterisk.
The identifier for a symbol is called its print name.
The print name is a a sequence of characters used to identify the symbol. Internally, this the print name is stored as a string. It is possible but not advisable to alter a symbol's print name.
Star Sapphire LISP symbol names are limited only by the size of virtual memory: all characters are signficant. There is a slight amount of overhead to look up longer print names in the reader: this is probably not of any concern unless an arbitrarily long symbol print name is specified.
It should be kept in mind that some other LISP implementations may only consider some number of leading characters as significant.
For more information, see legal symbol print names.
10.2 Property Lists
Besides a print name, the symbol object also stores the address of a list. This list is, by convention, has zero items (is the empty list) or is even in length.
The even numbered items, typically symbols, serve as the name of properties, known as 'keys' or 'indicators'. The odd numbered items, which can be arbitrary LISP objects, are the corresponding values for each key, and are usually called 'properties'. Hence the name 'property list' (often abbreviated plist).
There are no duplications among the keys; a property list should only have one property at a time with a given name. In this way, given a symbol and an indicator (another symbol), an associated value can be retrieved from the first symbols' plist.
When a symbol is created, its property list is initially empty. Properties are created by using get within a setf form. Other operations can be performed on property lists using the property-list functions.
Property lists represent a rudimentary object oriented extension to LISP; however, with the addition of more sophisticated language features such as structures and classes to LISP, property lists are not used as much for this kind of programming.
Therefore Common LISP does not use a symbol's property list as extensively as prior LISP dialects. Assorted system implementation and environmental information, such as compiler, debugging, and documentation data, is stored on property lists in Common LISP. The caveat here is that it is dangerous to directly access a symbols property list, particularly in implementations such as Star Sapphire LISP which use symbol property lists for this purpose.
However, if you use the property-list functions to access property lists, it is safe to use this feature of symbols; the property list can be used to build small quick and dirty databases which don't need a lot of speed. Because property-list functions are given the symbol and not the list itself, modifications to the property list can be recorded by storing back into the property-list cell of the symbol; you are assured of not damaging a given symbols' property list.
10.2.1 Property List Implementation
A property list is implemented as a memory location which contains a list with an even number (possibly zero) of elements. Usually this memory location is the property-list slot of a symbol, but any memory location acceptable to setf can be used if getf and remf are to be used.
Thus it is important to note that a property list is not neccesarily associated with a symbol. A property list is an ordinary list with no unusual implementation details other than being maintained by convention by the property-list functions.
A property list not explicitly stored in a symbols' property list slot is sometimes called a 'disembodied property list': this grotesque term simply indicates a ordinary list in the format of a property list.
10.2.1 Property lists versus association lists
A property list is very similar to an association list. Both are pairs of unique keys and values. The difference is that a property list can be stored in a symbols' property list slot. Hence it can represent an object in its own right.
The operations for adding and removing entries in property-lists are destructive operations. They alter the property list. Association lists make a new list. Association lists on the other hand are normally augmented without side effects (non-destructively) by appending new entries to the front (see acons and pairlis).
10.3 Symbol Package
The package slot refers to a package object. Given the symbol's name, the package data structure can locate that symbol.
Nevertheless, only when considered relative to a package is a symbol is uniquely identified by its name.
A symbol may appear in many packages, but it can be owned by one and only one package. The package slot points to the specific owner of the symbol, if any.
A symbol may be without a package, in which case it has an #: prefix when printed.
The relation of symbols to a given package (or lack thereof) brings up some important points on how symbols are used conventionally.
Symbols can be used in two somewhat different ways. An interned symbol is indexed by its print name in a data structure called a package containing a hash table.
Every time a symbol with a given print name is read, the same symbol is produced. This is in the strictest sense of the word: the result will always be eq.
This behavior of symbols makes them appropriate as names for 'things': as objects in the more real-world sense. In particular, symbols can be used as hooks on which to hang permanent data objects (using the property list, for example).
Interned symbols are created automatically as part of the normal behavor of the reader. The first time some entity (such as the function read) asks the package system for a symbol with a specific print name, that symbol is automatically created. Once read, an interned symbol is never collected as there is no way to tell whether it (and its associated properties and value) will be needed at some time in the future.
The intern function can be used to create an interned symbol if a new one is needed.
Interned symbols are the most commonly used. For more information, see packages.
The second use of symbols is simply as a data object, with no special cataloging -- it has membership in no particular package. As stated above, an uninterned symbol is printed as #: followed by its print name.
The make-symbol and gensym functions allow the creation of uninterned symbols.
10.4 Other Symbol Slots
The Star Sapphire implementation of Common LISP has several other symbol object slots which are not defined by the language specification; these slots are used to optimize the speed of certain operations, or for backward compatibility with past versions of the product. There is a value slot which stores the value of a given symbol if it has a global value. The value slot for a symbol stores the value of a 'special' variable when no special binding is present. If a given symbol is unbound, this slot contains the value 'vnil' which is the highest virtual address plus one.
The functional object slot contains the virtual address of a function object; this is used to map symbol names to functional object in the compiler.
10.5 Lexvars and functional symbols
With all this in mind, it is important to note that for the sake of optimization the Star Sapphire implementation of Common LISP has two classes of object which under normal circumstances are read, evaluated and printed just like symbols; but are not implemented in the same way as 'vanilla' symbols. This will only be of concern under certain obscure circumstances; much effort has been devoted to make these objects act just like symbols.
These are lexvars and functional symbols.
10.5.1 Lexvars
A symbol record as described above is created for every symbol which is read in the normal course of events. However, within structures created as the result of reading LISP objects, some symbols are 'fixed up': translated into immediate objects which do not require a (possibly expensive) fetch from virtual memory.
Lexvars are a two part index in the stack relative to some context; these express the relative displacement in the display stack (which indicates a position in the LISP stack per se as a marker of the base of some lexical scope); plus the offset in the indicated stack frame.
A lexvar thus can only have value and nothing more; after translation it exists as an anonymous 16 bit number. An example of a lexvar is the paramter 'x' in the following function:
(defun my-product(x)(* x x))
After being fixed-up by the incremental compiler (canonicalizer), this function body will look like:
(* #<LEXVAR 1:0> #<LEXVAR 1:0>)
The first number following the LEXVAR identifier is the display stack offset, the second is the offset in the stack frame. The reason the display is effectively '1' is because there is an implicit block surrounding every defuns' body, which creates an additional display stack frame.
10.5.2 Functional Symbols
The second kind of 'fixed-up' symbol is the functional symbol. This approximately corresponds to the Common LISP functional variable concept; functional variables are often stored as a functional symbol, in a particular context. However, not all functional variables are implemented as functional symbols.
Symbols at the start of a list which is not quoted will become a function, a special form or a macro; such symbols are translated by the compiler (canonicalizer) into an index into an internal table which uniquely identifies the functional object needed to evalate this list. If this function is altered, the internal table is altered. This speeds up the evaluator greatly.
Functional symbols are printed occasionally as #<FUNCTION ...> objects. However, under normal circustances they are simply printed as the name of the symbol they were converted from. The print function has more time to look up this information.