22. INPUT/OUTPUT
This chapter contains the following sections.
22.1 Printed Representation of LISP Objects
22.1.1 What the Read Function Accepts
Table of Standard Character Syntax Types
22.1.2 Parsing of Numbers and Symbols
Actual Syntax of Numbers
Standard Constituent Character Attributes
22.1.3 Macro Characters
22.1.4 Standard Dispatching Macro Character Syntax
22.1.5 The Readtable
22.1.6 What the Print Function Produces
22.2.1 Input from Character Streams
22.3.1 Output to Character Streams
22.3.2 Formatting Output to Character Streams
22.4 Querying the User
Common LISP provides a rich set of facilities for performing input/output. All input/output operations are performed on streams of various kinds. This chapter is devoted to stream data transfer operations. Streams are discussed in Chapter 21, and ways of manipulating files through streams are discussed in Chapter 23.
While there is provision for reading and writing binary data, most of the I/O operations in Common Lisp read or write characters. There are simple primitives for reading and writing single characters or lines of data. The format function can perform complex formatting of output data, directed by a control string in manner similar to a C printf function call. The most useful I/O operations, however, read and write printed representations of arbitrary LISP objects.
22.1 Printed Representation of LISP Objects
LISP objects in general are not text strings, but complex data structures. They have very different properties from text strings as a consequence of their internal representation. However, to make it possible to get at and talk about LISP objects, LISP provides a representation of most objects in the form of printed text; this is called the printed representation, which is used for input/output purposes and in the examples throughout this manual. Functions such as print take a LISP object and send the characters of its printed representation to a stream. The collection of routines that does this is known as the (LISP) printer. The read function takes characters from a stream, interprets them as a printed representation of a LISP object, builds that object, and returns it; the collection of routines that does this is called the (LISP) reader. Ideally, one could print a LISP object and then read the printed representation back in, and so obtain the same identical object. In practice this is difficult and for some purposes not even desirable. Instead, reading a printed representation produces an object that is (with obscure technical exceptions) equal to the originally printed object.
Most LISP objects have more than one possible printed representation. For example, the integer twenty-seven can be written in any of these ways:
27
27.
#o33
#x1B
#b11011
#.(* 3 3 3)
81/3
A list of two symbols A and B can be printed in many ways:
(A B) (a b) ( a b ) (\A |B|)
(|\A|
B
)
The last example, which is spread over three lines, may be ugly, but it is legitimate. In general, wherever whitespace is permissible in a printed representation, any number of spaces and newlines may appear.
When print produces a printed representation, it must choose arbitrarily from among many possible printed representations. It attempts to choose one that is readable. There are a number of different printing functions. This section describes in detail what is the standard printed representation for any LISP object, and also describes how read operates.
22.1.1 What the Read Function Accepts
The purpose of the LISP reader is to accept characters, interpret them as the printed representation of a LISP object, and construct and return such an object. The reader cannot accept everything that the printer produces; for example, the printed representations of compiled code objects cannot be read in. However, the reader has many features that are not used by the output of the printer at all, such as comments, alternative representations, and convenient abbreviations for frequently used but unwieldy constructs. The reader is also parameterized in such as way that it can be used as a lexical analyzer for a more general user-written parser.
The reader is organized as a recursive-descent parser. Broadly speaking, the reader operates by reading a character from the input stream and treating it in one of three ways. Whitespace characters serve as separators but are otherwise ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions, including recursive invocation of the reader.
More precisely, when the reader is invoked, it reads a single character from the input stream and dispatches according to the syntactic type of that character. Every character that can appear in the input stream must be of exactly one of the following kinds: illegal, whitespace, constituent, single escape, multiple escape, or macro. Macro characters are further divided into the types terminating and non-terminating (of tokens). (Note that macro characters have nothing whatever to do with macros in their operation. There is a superficial similarity in that macros allow the user to extend the syntax of Common LISP at the level of forms, while macro characters allow the user to extend the syntax at the level of characters.) Constituents additionally have one or more attributes, the most important of which is alphabetic.
The parsing of Common LISP expressions is discussed in terms of these syntactic character types because of the types of individual characters are not fixed but may be altered by the user (see set-syntax-from-char and set-macro-character). The characters of the standard character set initially have the syntactic types shown in the Table of Standard Character Syntax Types.
Note that the brackets, braces, question mark, and exclamation point (that is, [, ], {, }, ?, and !) are normally defined to be constituents, but they are not used for any purpose in standard Common LISP syntax and do not occur in the names of built-in Common LISP functions or variables. These characters are explicitly reserved to the user. The primary intent is that they be used as macro characters; but a user might choose, for example, to make ! be a single escape character (as it is in Portable Standard LISP). The algorithm performed by the Common LISP reader is roughly as follows:
1. If at end of file, perform end-of-file processing (as specified by the caller of the read function). otherwise, read one character from the input stream, call it x, and dispatch according to the syntactic type of x to one of steps 2 to 7.2. If x is an illegal character, signal an error.
2. If x is an illegal character, signal an error.
3. If x is a whitespace character, then discard it and go back to step 1.
4. If x is a macro character (at this point the distinction between terminating and non-terminating macro characters does not matter), then execute the function associated with that character. The function may return zero values or one value (see values).
The macro-character function may of course read characters from the input stream; if it does, it will see those characters following the macro character. The function may even invoke the reader recursively. This is how the macro character ( constructs a list: by invoking the reader recursively to read the elements of the list.
If one value is returned, then return that value as the result of the read operation; the algorithm is done. If zero values are returned, then go back to step 1.
5. If x is a single escape character (normally \), then read the next character and call it y (but if at end of file, signal an error instead). Ignore the usual syntax of y and pretend it is a constituent whose only attribute is alphabetic. (If y is a lowercase character, leave it alone; do not replace it with the corresponding uppercase character.) Use y to begin a token, and go to step 8.
6. If x is a multiple escape character (normally |), then begin a token (initially containing no characters) and go to step 9.
7. If x is a constituent character, then it begins an extended token. After the entire token is read in, it will be interpreted either as representing a LISP object such as a symbol or number (in which case that object is returned as the result of the read operation), or as being of illegal syntax (in which case an error is signalled). If x is a lowercast character, replace it with the corresponding uppercase character. Use x to begin a token, and go on to step 8.
8. (At this point a token is being accumulated, and an even number of multiple escape characters have been encountered.) If at end of file, go to step 10. Otherwise, read a character (call it y), and perform one of the following actions according to its syntactic type:
If y is a constituent or non-terminating macro then do the following. If y is a lowercase character, replace it with the corresponding uppercase character. Append y to the token being built, and repeat step 8 and call it z (but if at end of file, signal an error instead). Ignore the usual syntax of z and pretend it is a constituent whose only attribute is alphabetic. (If z is a lowercase character, leave it alone; do not replace it with the corresponding uppercase character.) Append z to the token being built, and repeat step 8.
9. (At this point a token is being accumulated, and an odd number of multiple escape characters have been encountered.) If at end of file, signal an error. Otherwise, read a character (call it y), and perform one of the following actions according to its syntactic type:
10. An entire token has been accumulated. Interpret it as representing a LISP object and return that object as the result of the read operation, or signal an error if the token is not of legal syntax.
As a rule, a single escape character never stands for itself but always serves to cause the following character to be treated as a simple alphabetic character. A single escape character can be included in a token only if preceded by another single secape character.
A multiple escape character also never stands for itself. The characters between a pair of multiple escape characters are all treated as simple alphabetic characters, except that single escape and multiple escape characters must nevertheless be preceded by a single escape character to be included.
Table of Standard Character Syntax Types
<tab> whitespace |
<page>whitespace |
<newline>whitespace |
<space> whitespace |
@ constituent |
terminating macro |
! constituent* |
A constituent |
a constituent |
" terminating macro |
B constituent |
b constituent |
# non-terminating macro |
C constituent |
c constituent |
$ constituent |
D constituent |
d constituent |
% constituent |
E constituent |
e constituent |
& constituent |
F constituent |
f constituent |
' terminating macro |
G constituent |
g constituent |
( terminating macro |
H constituent |
h constituent |
) terminating macro |
I constituent |
i constituent |
* constituent |
K constituent |
k constituent |
+ constituent |
K constituent |
k constituent |
' terminating macro |
L constituent |
l constituent |
- constituent |
M constituent |
m constituent |
/ constituent |
O constituent |
o constituent |
0 constituent |
P constituent |
p constituent |
1 constituent |
Q constituent |
q constituent |
2 constituent |
R constituent |
r constituent |
3 constituent |
S constituent |
s constituent |
4 constituent |
T constituent |
t constituent |
5 constituent |
U constituent |
u constituent |
6 constituent |
V constituent |
v constituent |
7 constituent |
W constituent |
w constituent |
8 constituent |
X constituent |
x constituent |
9 constituent |
Y constituent |
y constituent |
: constituent |
Z constituent |
z constituent |
; terminating macro |
[ constituent* |
{ constituent* |
< constituent |
\ single escape |
| multiple escape |
= constituent |
] constituent* |
} constituent* |
> constituent |
^ constituent |
~ constituent |
? constituent* |
_ constituent |
<rubout> constituent |
<backspace> constituent |
|
|
<return> whitespace |
|
<linefeed> whitespace |
The characters marked with an asterisk are initially constituents, but are reserved to the user for use as macro characters or for any other desired purpose.
22.1.2 Parsing of Numbers and Symbols
When an extended token is read, it is interpreted as a number or symbol. In general, the token is interpreted as a number if it satisfies the syntax for numbers specified in the Actual Syntax of Numbers article this is discussed in more detail below.
The characters of the extended token may serve various syntactic functions as shown in Standard Constituent Character Attributes, but it must be remembered that any character included in a token under the control of an escape character is treated as alphabetic rather than according to the attributes shown in the table. One consequence of this rule is that a whitespace, macro, or escape character will always be treated as alphabetic within an extended token because such a character cannot be included in an extended token except under the control of an escape character.
To allow for extensions to the syntax of numbers, a syntax for potential numbers is defined in Common LISP that is more general than the actual syntax for numbers (See Actual Syntax of Numbers) . Any token that is not a potential number and does not consist entirely of dots will always be taken to be a symbol, now and in the future; programs may rely on this fact. Any token that is a potential number but does not fit the actual number syntax defined below is a reserved token and has an implementation-dependent interpretation; an implementation may signal an error, quietly treat the token as a symbol, or take some other action. Programmers should avoid the use of such reserved tokens. (A symbol whose name looks like a reserved token can always be written using one or more escape characters.)
A token is a potential number if it satisfies the following requirements:
As examples, the following tokens are potential numbers, but they are not actually numbers as defined below, and so are reserved tokens. (They do indicate some interesting possibilities for future extensions.)
1b5000
777777q
1.7J
-3/4+6.7J
12/25/83
27^19
3^4/5
6//7
3.1.2.6
^-43^
3.141_592_653_589_793_238_4
-3.7+2.6i-6.17j+19.6k
The following tokens are not potential numbers but are always treated as symbols:
/
/5
+
1+
1-
foo+
ab.cd _
^
^/-
The following tokens are potential numbers if the value of *read-base* is 16 (an abnormal situation), but they are always treated as symbols if the value of *read-base* is 10 (the usual value):
bad-face
25-dec-83
a/b
fad_cafe
f^
It is possible for there to be an ambiguity as to whether a letter should be treated as a digit or as a number marker. In such a case, the letter is always treated as a digit rather than as a number marker.
Note that the printed representation for a potential number may not contain any escape charcters. An escape character robs the following character of all syntactic qualities, forcing it to be strictly alphabetic and therefore unsuitable for use in a potential number. For example, all of the following representations are inrepreted as symbols, not numbers:
/256
25/64
1.0/E6
|100|
3/.14159
|3/4|
3//4
5||
In each case, removing the escape character(s) would allow the token to be treated as a number.
If a potential number can in fact be interpreted as a number according to the BNF syntax in Actual Syntax of Numbers, then a number object of the appropriate type is constructed and returned. It should be noted that in a given implementation it may be that not all tokens conforming to the actual syntax for numbers can actually be converted into number objects. For example, specifying too large or too small an exponent for a floating-point number may make the number impossible to represent in the implementation. Similarly, a ratio with denominator zero (such as -35/000) cannot be represented in any implementation. In any such circumstance where a token with the syntax of a number cannot be converted to an internal number object, an error is signalled. (On the other hand, an error must not be signalled for specifying too many significant digits for a floating-point number; an appropriately truncated or rounded value should be produced.)
There is an omission in the syntax of numbers, as described in Actual Syntax of Numbers, in that the syntax does not account for the possible use of letters as digits. The radix used for reading integers and ratios is decimal. Non-decimal constants in Common LISP programs or portable Common LISP data files should be written using #O, #X, #B, or #nR syntax.
If a token consists solely of dots (with no escape characters), then an error is signalled, except in one circumstance: if the token is a single dot and occurs in a situation appropriate to "dotted list" syntax, then it is accepted as a part of such syntax. Signalling an error catches not only misplaced dots in dotted list syntax, but also lists that were truncated by *print-length* cutoff, because such lists end with a three-dot sequence (...). Examples:
(a . b) ;A dotted pair of a and b
(a.b) ;A list of one element, the symbol named a.b
(a. b) ;A list of two elements a. and b
(a .b) ;A list of two elements a and .b
(a /. b) ;A list of three elements a, ., and b
(a |.| b) ;A list of three elements a, ., and b
(a /... b) ;A list of three elements a, ..., and b
(a |...| b) ;A list of three elements a, ..., and b
(a b . c) ;A dotted list of a and b with c at the end
(. b) ;Illegal; an error is signalled.
(a .) ;Illegal; an error is signalled.
(a .. b) ;Illegal; an error is signalled.
(a . . b) ;Illegal; an error is signalled.
(a b c ...) ;Illegal; an error is signalled.
In all other cases, the token is construed to be the name of a symbol. If there are any package markers (colons) in the token, they divide the token into pieces used to control the lookup and creation of the symbol. If there is a single package marker, and it occurs at the beginning of the token, then the token is interpreted as a keyword, that is, a symbol in the :keyword package. The part of the token after the package marker must not have the syntax of a number.
If there is a single package marker not at the beginning or end of the token, then it divides the token into two parts. The first part specifies a package; the second part is the name of an external symbol available in that package. Neither of the two parts may have the syntax of a number.
If there are two adjacent package markers not at the beginning or end of the token, then they divide the token into two parts. The first part specifies a package; the second part is the name of a symbol within that package (possibly an internal symbol). Neither of the two parts may have the syntax of a number. If a symbol token contains no package markers, then the entire token is the name of the symbol. The symbol is looked up in the default package; see *package*.
All other patterns of package markers, including the cases where there are more than two package markers or where a package marker appears at the end of the token, presently do not mean anything in Common LISP; see Chapter 11. It is therefore currently an error to use such patterns in a Common LISP program. The valid patterns for tokens may be summarized as follows:
nnnnn |
a number |
xxxxx |
a symbol in the current package |
:xxxxx |
a symbol in the keyword package |
ppppp:xxxxx |
an external symbol in the ppppp package |
ppppp::xxxxx |
a (possibly internal) symbol in the ppppp package where nnnnn has the syntax of a number, and xxxxx and ppppp do not have the syntax of a number. |
Actual Syntax of Numbers
number ::= integer ratio floating-point-number
integer ::= [sign] {digit}+ [decimal-point]
ratio ::= [sign] {digit}+ / {digit}+
floating-point-number ::= [sign] {digit}* decimal-point {digit}+
[exponent]
[sign] {digit}+ [decimal-point {digit}*] exponent
sign ::= + -
decimal-point ::= .
digit ::= 0 1 2 3 4 5 6 7 8 9
exponent ::= exponent-marker [sign] {digit}+
exponent-marker ::= e s f d 1 E S F D L
The notation {x}* means zero or more occurrences of x, the notation {x}+ means one or more occurrences of x, and the notation [x] means zero or one occurrences of x.
Standard Constituent Character Attributes
! alphabetic |
<backspace> illegal |
" alphabetic* |
<tab> illegal* |
# alphabetic* |
<newline> illegal* |
$ alphabetic |
<linefeed> illegal* |
% alphabetic |
<page> illegal |
& alphabetic |
<return> illegal* |
' alphabetic* |
<space> illegal* |
( alphabetic* |
+ alphabetic, plus sign |
) alphabetic* |
- alphabetic, minus sign |
* alphabetic |
. alphabetic, dot, decimal point |
, alphabetic* |
/ alphabetic, ratio marker |
0 alphadigit |
A, a alphadigit |
1 alphadigit |
B, b alphadigit |
2 alphadigit |
C, calphadigit |
3 alphadigit |
D, d alphadigit, double-float exponent marker |
4 alphadigit |
E, e alphadigit, float exponent marker |
5 alphadigit |
F, f alphadigit, single-float exponent marker |
6 alphadigit |
G, g alphadigit |
7 alphadigit |
H, h alphadigit |
8 alphadigit |
I, i alphadigit |
9 alphadigit |
J, j alphadigit |
: package marker |
K, k alphadigit |
; alphabetic* |
L, l alphadigit, long-float exponent marker |
< alphabetic |
M, m alphadigit |
= alphabetic |
N, n alphadigit |
> alphabetic |
P, p alphadigit |
? alphabetic |
P, p alphadigit |
@ alphabetic |
Q, q alphadigit |
[ alphabetic |
R, r alphadigit |
\ alphabetic* |
S, s alphadigit, short-float exponent marker |
] alphabetic |
T, t alphadigit |
^ alphabetic |
U, u alphadigit |
_ alphabetic |
V, v alphadigit |
` alphabetic* |
W, w alphadigit |
{ alphabetic |
X, x alphadigit |
| alphabetic* |
Y, y alphadigit |
} alphabetic |
Z, z alphadigit |
~ alphabetic |
<rubout> illegal |
The interpretations in this table apply only to characters whose syntactic type is constituent. Entries marked with an asterisk are normally shadowed because the indicated characters are of syntactic type whitespace, macro, single escape, or multiple escape. Characters with the alphadigit attribute are interpreted as having the digit or alphabetic attribute according to whether or not the character is a valid digit in the radix specified by *read-base*. Characters with the illegal attribute cannot ever appear in a token except under the control of an escape character.
22.1.3 Macro Characters
If the reader encounters a macro character, then the function associated with that macro character is invoked and may produce an object to be returned. This function may read following characters in the stream in whatever syntax it likes (it may even call read recursively) and return the object represented by that syntax. Macro characters may or may not be recognized, of course, when read as part of other special syntaxes (such as for strings).
The reader is therefore organized into two parts: The basic dispatch loop, which also distinguishes symbols and numbers, and the collection of macro characters. Any character can be reprogrammed as a macro character; this is a means by which the reader can be extended. The macro characters normally defined are as follows:
(
)
‘
"
`
,
#
(
The left-parenthesis character initiates reading of a pair or list. The function read is called recursively to read successive objects until a right parenthesis is found to be next in the input stream. A list of the objects read is returned. Thus
(a b c)
is read as a list of three objects (the symbols a, b, and c). The right parenthesis need not immediately follow the printed representation of the last object; whitespace characters and comments may precede it. This can be useful for putting one object on each line and making it easy to add new objects:
(defun traffic-light (color)
(case color
(green)
(red (stop))
(amber (accelerate)) ;Insert more colors after this line.
))
It may be that no objects precede the right parenthesis, as in () or ( ); this reads as a list of zero objects (the empty list). If a token that is just a dot, not preceded by an escape character, is read after some object then exactly one more object must follow the dot, possibly followed by whitespace, followed by the right parenthesis:
(a b c . d)
This means that the cdr of the last pair in the list is not nil, but rather the object whose representation followed the dot. The above example might have been the result of evaluating
(cons 'a (cons 'b (cons 'c 'd))) => (a b c . d)
Similarly, we have
(cons 'foo 'bar) => (foo . bar
It is permissible for the object following the dot to be a list:
(a b c d . (e f . (g))) is the same as (a b c d e f g)
but this is a non-standard form that print will never produce.
)
The right-parenthesis character is part of various constructs (such as the syntax for lists) using the left-parenthesis character and is invalid except when used in such a construct.
'
The single-quote (accent acute) character provides an abbreviation to make it easier to put constants in programs. 'foo reads the same as (quote foo): A list of the symbol quote and foo.
;
Semicolon is used to write comments. The semicolon and all characters up to and including the next newline are ignored. Thus a comment can be put at the end of any line without affecting the reader. (A comment will terminate a token, but a newline would terminate the token anyway.)
"
The double quote character begins the printed representation of a string. Characters are read from the input stream and accumulated until another double quote is encountered. An exception to this occurs if a single escape character is seen; the escape character is discarded, the next character is accumulated, and accumulation continues. When a matching double quote is seen, all the accumulated characters up to but not including the matching double quote are made into a simple string and returned.
;;;; COMMENT-EXAMPLE function.
;;; This function is useless except to demonstrate comments.
;;; (Actually, this example is much too cluttered with them.)
;;; Notice that there are several kinds of comments.
(defun comment-example (x y) ; X is anything; Y is an a-list.
(cond ((listp x) x) ; If X is a list, use that.
;; X is now not a list. There are two other cases.
((symbolp x)
;; Look up a symbol in the a-list.
(cdr (assoc x y))) ; Remember, (cdr nil) is nil.
;; Do this when all else fails:
(t (cons x ;Add x to a default list.
'((lisp t) ;LISP is okay.
(fortran nil) ;FORTRAN is not.
(pl/i -500) ;You can put comments in "data"
(ada .001) ;as well as in "programs".
;; COBOL??
(teco -1.0e9))))))
This example illustrates a few conventions for comments in common use. Comments may begin with one to four semicolons.
Compatibility note: These conventions arose among users of MACLISP and have been found to be very useful. The conventions are conveniently exploited by certain software tools, such as the EMACS editor and the ATSIGN listing program developed at MIT.
`
The backquote (accent grave) character makes it easier to write programs to construct complex data structures by using a template. As an example, writing
`(cond ((numberp ,x) ,@y) (t (print ,x) ,@y))
is roughly equivalent to writing
(list 'cond
(cons (list 'numberp x) y)
(list* 't (list 'print x) y))
The general idea is that the backquote is followed by a template, a picture of a data structure to be built. This template is copied, except that within the template commas can appear. Where a comma occurs, the form following the comma is to be evaluated to produce an object to be inserted at that point. Assume b has the value 3, for example, then evaluating the form denoted by `(a b ,b ,(+ b 1) b) produces the result (a b 3 4 b).
If a comma is immediately followed by an at-sign (@), then the form following the at-sign is evaluated to produce a list of objects. These objects are then "spliced" into place in the template. For example, if x has the value (a b c), then
(`(X ,X ,@x foo ,(cadr x) bar ,(cdr x) baz ,@(cdr x))
=> (x (a b c) a b c foo b bar (b c) baz b c)
The backquote syntax can be summarized formally as follows. For each of several situations in which backquote can be used, a possible interpretation of that situation as an equivalent form is given. Note that the form is equivalent only in the sense that when it is evaluated it will calculate the correct result. An implementation is quite free to interpret backquote in any way such that a backquoted form, when evaluated, will produce a result equal to that produced by the interpretation shown here.
where the brackets are used to indicate a transformation of an xj as follows:
No other uses of comma are permitted; in particular, it may not appear within the #A or #S syntax.
Anywhere ,@ may be used, the syntax ,. may be used instead to indicate that it is permissible to destroy the list produced by the form following the ,.; this may permit more efficient code, using nconc instead of append, for example. If the backquote syntax is nested, the innermost backquoted form should be expended first. This means that if several commas occur in a row, the leftmost one belongs to the innermost backquote.
,
The comma character is part of the backquote syntax and is invalid if used other than inside the body of a backquote construction as described above.
#
This is a dispatching macro character. It reads an optional digit string and then one more character, and uses that character to select a function to run as a macro-character function. The # character also happens to be a non-terminating macro character. This is completely independent of the fact that it is a dispatching macro character; it is a coincidence that the only standard dispatching macro character in Common LISP is also the only standard non-terminating macro character.
See Standard Dispatching Macro Character Syntax for predefined # macro-character constructions.
22.1.4 Standard Dispatching Macro Character Syntax
The standard syntax includes forms introduced by the # character.
Refer to the following articles for more detail:
#\
#’
#(
#)
#*
#:
#B
#O
#X
#nR
#nA
#S
#|
#<
These take the general form of a #, a second character that identifies the syntax, and following arguments in some form. If the second character is a letter, then case is not important; #O and #o are considered to be equivalent, for example. Certain # forms allow an unsigned decimal number to appear between the # and the second character; some other forms even require it. Those forms that do not explicitly permit such a number to appear forbid it.
The # constructs in this implementation are summarized in the following table:
#! |
shell escape |
#' |
function abbreviation |
#( |
simple vector |
#) |
signals error |
#* |
bit-vector |
#0 |
used for infix arguments |
#1 |
used for infix arguments |
#2 |
used for infix arguments |
#3 |
used for infix arguments |
#4 |
used for infix arguments |
#5 |
used for infix arguments |
#6 |
used for infix arguments |
#7 |
used for infix arguments |
#8 |
used for infix arguments |
#9 |
used for infix arguments |
#: |
uninterned symbol |
#< |
signals error |
#? |
describe macro character |
#\ |
character object |
#| |
balanced comment |
#<backspace> |
signals error |
#<tab> |
signals error |
#<newline> |
signals error |
#<linefeed> |
signals error |
#<page> |
signals error |
#<return> |
signals error |
#<space> |
signals error |
#A , #a |
array |
#B , #b |
binary rational |
#O , #o |
octal rational |
#R , #r |
radix-n rational |
#S , #s |
struct instance |
#X , #x |
hexadecimal rational |
#\
#\x
reads in a character object that represents the character x. Also, #\name reads in as the character object whose name is name.In the single-character case, the character x must be followed by a non-constituent character, lest a name appear to follow the #\. A good model of what happens is that after #\ is read, the reader backs up over the \ and then reads an extended token, treating the initial \ as an escape character (whether it really is nor not in the current readtable).
Uppercase and lowercase letters are distinguished after #\; #\A and #\a denote different character objects. Any character works after #\, even those that are normally special to read, such as parentheses. Non-printing characters may be used after #\, although for them names are generally preferred.
#\name
reads in as a character object whose name is name (actually, whose name is (string-upcase name); therefore the syntax is case-insensitive). The name should have the syntax of a symbol. The following names are implemented:
newline |
The character that represents the division between lines |
space |
The space or blank character |
rubout |
The rubout or delete character |
page |
The form-feed or page-separator character |
tab |
The tabulate character |
backspace |
The backspace character |
return |
The carriage return character |
linefeed |
The line-feed character |
When the LISP printer types out the name of a special character, it uses the same table as the #\ reader; therefore any character name you see typed out is acceptable as input (in that implementation). Standard names are always preferred over non-standard names for printing.
#'
#'foo
is an abbreviation for (function foo). foo may be the printed representation of any LISP object. This abbreviation maybe remembered by analogy with the macro-character, since the function and quote special forms are similar in form.
#(
A series of representations of objects enclosed by #( and ) is read as a simple vector of those objects. This is analogous to the notation for lists. If an unsigned decimal integer appears between the # and (, it specifies explicitly the length of the vector. In that case, it is an error if too many objects are specified before the closing ), and if too few are specified, the last object (it is an error if there are none in this case) is used to fill all remaining elements of the vector. For example,
#(a b c c c c)
#6(a b c c c c)
#6(a b c)
#6(a b c c)
all mean the same thing: a vector of length 6 with elements a, b, and four instances of c. The notation #() denotes an empty vector, as does #0() (which is legitimate because it is not the case that too few elements are specified).
#*
A series of binary digits (0 and 1) preceded by #* is read as a simple bit-vector containing those bits, the leftmost bit in the series being bit 0 of the bit-vector. If an unsigned decimal integer appears between the # and *, it specifies explicitly the length of the vector. In that case, it is an error if too many bits are specified, and if too few are specified the last one (it is an error if there are none in this case) is used to fill all remaining elements of the bit-vector. For example,
#*101111
#6*101111
#6*101
#6*1011
all mean the same thing: a vector of length 6 with elements 1, 0, 1, 1, 1, and 1. The notation #* denotes an empty bit-vector, as does #0* (which is legitimate because it is not the case that too few elements are specified).
#:
#:foo
(‘poundsign colon foo’) requires foo to have the syntax of an unqualified symbol name (no embedded colons). It denotes an uninterned symbol whose name is foo. Every time this syntax is encountered, a different uninterned symbol is created. If it is necessary to refer to the same uninterned symbol more than once in the same expression, the #= syntax may be useful.
#B
#brational
reads rational in binary (radix 2). For example,#B1101 == 13, and #b101/11 == 5/3.
#O
#orational
reads rational in octal (radix 8). For example,#o37/15 == 31/13,
and #o777 == 511.
#X
#xrational
reads rational in hexadecimal (radix 16). The digits above 9 are the letters A through F (the lowercase letters a through f are also acceptable). For example, #xFOO == 3840.
#nR
#radixRrational reads rational in radix radix. radix must consist of only digits, and it is read in decimal; its value must be between 2 and 36 (inclusive). For example, #3r102 is another way of writing 11, and #11R32 is another way of writing 35. For radices larger than 10, letters of the alphabet are used in order for the digits after 9.
#nA
The syntax #nAobject constructs an n-dimensional array, using object as the value of the :initial-contents argument to make-array.
The value of n makes a difference: #2A((0 1 5) (foo 2 (hot dog))), for example, represents a 2-by-3 matrix:
0 |
1 |
5 |
foo |
2 |
(hot dog) |
In contrast, #1A((0 1 5) (foo 2 (hot dog))) represents a length-2 array whose elements are lists:
(0 1 5) |
(foo 2 (hot dog)) |
Furthermore, #0A((0 1 5) (foo 2 (hot dog))) represents a zero-dimensional array whose sole element is a list:
((0 1 5) |
(foo 2 (hot dog))) |
Similarly, #0Afoo (or, more readably, #0Afoo) represents a zero-dimensional array whose sole element is the symbol foo. The expression #1Afoo would not be legal because foo is not a sequence.
#S
The syntax #S(name slot1 value1 slot2 value2 ...) denotes a structure. This is legal only if name is the name of a structure alreadydefined by defstruct and if the structure has a valid constructor. The slots are named by the symbol consisting of the printname of the slot name interned into the keyword package. For instance,
#(FRED :LAST-NAME MUGGS :MIDDLE-NAME J)
is an instance of the following defstruct:
(defstruct fred last-name middle-name)
#|
#|...|#
is treated as a comment by the reader, just as everything from a ; (semicolon) to the next newline is treated as a comment. Anything may appear in the comment, except that it must be balanced with respect to other occurrences of #| and |#. Except for this nesting rule, the comment may contain any characters whatsoever.The main purpose of this construct is to allow "commenting out" of blocks of code or data. The balancing rule allows such blocks to contain pieces already so commented out. In this respect the #|...|# syntax of Common LISP differs from the /*...*/ comment syntax used.
#<
This is not legal reader syntax. It is used in the printed representation of objects that cannot be read back in. Attempting to read a #< will cause an error. (More precisely, it is legal syntax, but the macro-character function for it signals an error.)
#<space>, #<tab>, #<newline>, #<page>, #<return>
A # followed by a whitespace character is not legal reader syntax. This prevents abbreviated forms produced via *print-level* cutoff from reading in again, as a safeguard against losing information. (More precisely, this is legal syntax, but the macro-character function for it signals an error.)
#)
This is not legal reader syntax. This prevents abbreviated forms produced via *print-level* cutoff from reading in again, as a safeguard against losing information. (More precisely, this is legal syntax, but the macro-character function for it signals an error.)
22.1.5 The Readtable
Previous sections describe the standard syntax accepted by the read function. It is also possible to reprogram the read function completely by specifying an alternate readtable. This is an advanced facility used by programmers who want to use LISP as a basis for building their own language.
Messing up the readtable can produce amusing results which may cause your current LISP session to become unusable, (but otherwise not do any lasting harm), so a fair amount of caution is required. This is because any effects of altering the readtable will havea global impact on the LISP environment, including the ability to type in LISP expressions at the top level prompt -- such as (bye) for instance! If all else fails, Ctrl-Z will let you exit LISP, in spite of any changes to the reader.
There is a data structure called the readtable that is used to control the reader. It contains information about the syntax of each character equivalent to that in Table 22-1. It is set up exactly as in 22-1 to give the standard Common LISP meanings to all the characters,but the user can change the meanings of characters to alter and customize the syntax of characters. It is also possible to have several readtables describing different syntaxes and to switch from one to another by binding the variable *readtable*.
The following manual entries should be consulted for information on editing the readtable:
read-delimited-list
read-preserving-whitespace
*readtable*
copy-readtable
readtablep
set-syntax-from-char
set-macro-character
get-macro-character
make-dispatch-macro-character
set-dispatch-macro-character
get-dispatch-macro-character
22.1.6 What the Print Function Produces
The Common LISP printer is controlled by a number of special variables. These are referred to in the following discussion and are fully documented at the end of this section. How an expression is printed depends on its data type, as described in the following articles:
Printing Integers
Printing Ratios
Printing Floating-point numbers
Printing Characters
Printing Symbols
Printing Strings
Printing Conses
Printing Bit-vectors
Printing Vectors
Printing Arrays
Printing Structures
Printing Other Types
The following variables control various aspects of the appearance of printed objects:
*print-array*
*print-base*
*print-case*
*print-escape*
*print-funobj*
*print-gensym*
*print-level*
*print-length*
*print-pretty*
*print-radix*
Printing Integers
If appropriate, a radix specifier may be printed; see the variable *print-radix*. If an integer is negative, a minus sign is printed and then the absolute value of the integer is printed. Integers are printed in the radix specified by the variable *print-base* in the usual positional notation, most significant digit first. The number zero is represented by the single digit 0 and never has a sign. A decimal point may then be printed, depending on the value of *print-radix*.
Printing Ratios
If appropriate, a radix specifier may be printed; see the variable *print-radix*. If the ratio is negative, a minus sign is printed. Then the absolute value of the numerator is printed, as for an integer; then a /; then the denominator. The numerator and the variable *print-base*; they are obtained as if by the numerator and denominator functions, and so ratios are always printed in reduced form (lowest terms).
Printing Floating-point numbers
If the sign of the number (as determined by the function float-sign) is negative, then a minus sign is printed. Then the magnitude is printed in one of two ways. If the magnitude of the floating-point number is either zero or between 1000 (inclusive) and 10,000,000 (exclusive), it may be printed as the integer part of the number, then a decimal point, followed by the fractional part of the number; there is always at least one digit on each side of the decimal point. If the format of the number does not match that specified by the variable *read-default-float-format*, then the exponent marker for that format and the digit 0 are also printed. For example, the base of the natural logarithms as a short-format floating-point number might be printed as 2.71828S0. For non-zero magnitudes outside of the range 10-3 to 107, a floating-point number will be printed in "computerized scientific notation." The representation of the number is scaled to be between 1 (inclusive) and 10 (exclusive) and then printed, with one digit before the decimal point and at least one digit after the decimal point. Next the exponent marker is printed, except that if the format of the number matches that specified by the variable *read-default-float-format*, then the exponent marker E is used. Finally, the power of ten by which the fraction must be multiplied to equal the original number is printed as a decimal integer. For example, Avogadro's number as a short-format floating-point number might be printed as 6.02S23.
Printing Characters
When *print-escape* is nil, a character prints as itself; it is sent directly to the output stream. When *print-escape* is not nil, then #/ syntax is used. For example, the printed representation of the character #/A with control and meta bits on would be #/CONTROL-META-A, and that of #/a with control and meta bits on would be #/CONTROL-META-/a.
Printing Symbols
When *print-escape* is nil, only the characters of the print name of the symbol are output (but the case in which to print any uppercase characters in the print name is controlled by the variable *print-case*).
The remaining paragraphs describing the printing of symbols cover the situation when *print-escape* is not nil. Backslashes \ and vertical bars | are included as required. In particular, backslash or vertical-bar syntax is used when the name of the symbol would be otherwise treated by the reader as a potential number (see 22.1.2 Parsing of Numbers and Symbols).
The case in which to print any uppercase characters in the print name is controlled by the variable *print-case*. As a special case, nil may sometimes be printed as () instead, when *print-escape* and *print-pretty* are both not nil. Package prefixes may be printed (using colon syntax) if necessary. The rules for package qualifiers are as follows: When the symbol is printed, if it is in the keyword package, then it is printed with a preceding colon; otherwise, if it is accessible in the current package, it is printed without any qualification; otherwise, it is printed with qualification.
A symbol that is uninterned (has no home package) is printed preceded by #: if the variables *print-gensym* and *print-escape* are both non-nil; if either is nil, then the symbol is printed without a prefix, as if it were in the current package. The case in which symbols are printed is controlled by the variable *print-case*.
Printing Strings
The characters of the string are output in order. If *print-escape* is not nil, a double quote is output before and after, and all double quotes and single escape characters are preceded by backslash. The printing of strings is not affected by *print-array*. If the string has a fill pointer, then only those characters below the fill pointer are printed.
Printing Conses
Wherever possible, list notation is preferred over dot notation. Therefore the following algorithm is used:
This form of printing is clearer than showing each individual cons cell. Although the two expressions below are equivalent, and the reader will accept either one and produce the same data structure, the printer will always print such a data structure in the second form.
(a . (b . ((c . (d . nil)) . (e . nil))))
(a b (c d) e)
The printing of conses is affected by the variables *print-level* and *print-length*.
Printing Bit-vectors
A bit-vector is printed as #* followed by the bits of the bit-vector in order. If *print-array* is nil, however, then the bit-vector is printed in a format (using #<) that is concise but not readable. If the bit-vector has a fill pointer, then only those bits below the fill pointer are printed.
Printing Vectors
Any vector other than a string or bit-vector is printed using general-vector syntax; this means that information about specialized vector representations will be lost. The printed representation of a zero-length vector is #(). The printed representation of a non-zero-length vector begins with #(. Following that, the first element of the vector is printed. If there are any other elements, they are printed in turn, with a space printed before each additional element. A close parenthesis after the last element terminates the printed representation of the vector. The printing of vectors is affected by the variables *print-level* and *print-length*. If the vector has a fill pointer, then only those elements below the fill pointer are printed.
If *print-array* is nil, however, then the vector is not printed as described above, but in a format (using #<) that is concise but not readable.
Printing Arrays
Normally any array other than a vector is printed using #nA format. Let n be the rank of the array. Then # is printed, then n as a decimal integer, then A, then n open parentheses. Next the elements are scanned in row-major order. Imagine the array indices being enumerated in odometer fashion, recalling that the dimensions are numbered from 0 to n-1. Every time the index for dimension j is incremented, the following actions are taken:
1. If j < n-1, then print a close parenthesis.
2. If incrementing the index for dimension j caused it to equal dimension j, reset that index to zero and increment dimension j-1 (thereby performing these three steps recursively), unless j=0, in which case simply terminate the entire algorithm. If incrementing the index for dimension j did not cause it to equal dimension j, then print a space.
3. If j < n-1, then print an open parenthesis.
This causes the contents to be printed in a format suitable for the :initial-contents argument to make-array.
If *print-array* is nil, then the array is printed in a format (using #<) that is concise but not readable.
Printing Structures
Structures defined by defstruct are printed in the format #S(structure-name field-keyword field-value...).
This format is also valid input syntax.
Recursive and nested structure definitions conform to the current *print-level* and *print-length* and circular structure printing controls.
Printing Other Types
Any other types are printed in an implementation-dependent manner. It is recommended that printed representations of all such objects begin with the characters #< and end with > so that the reader will catch such objects and not permit them to be read under normal circumstances. This applies to streams, random states, compiled functions, jump buffers, lexical variables, and other objects which cannot be re-accepted by the reader.
When debugging or when frequently dealing with large or deep objects at top level, the user may wish to restrict the printer from printing large amounts of information. The variables *print-level* and *print-length* allow the user to control how deep the printer will print and how many elements at a given level the printer will print. Thus the user can see enough of the object to identify it without having to wade through the entire expression. This feature applies to printing arrays, structs and class instances.
22.2.1 Input from Character Streams
Many character input functions take optional arguments called input-stream, eof-error-p, and eof-value. The input-stream argument is the stream from which to obtain input; if unsupplied or nil it defaults to the value of the special variable *standard-input*. One may also specify t as a stream, meaning the value of the special variable *terminal-io* (which for most purposes is exactly the same as *standard-input*).
The eof-error-p argument controls what happens if input is from a file (or any other input source that has a definite end) and the end of the file is reached. If eof-error-p is true (the default), an error will be signalled at end of file. If it is false, then no error is signalled, and instead the function returns eof-value.
Functions such as read that read the representation of an object rather than a single character will always signal an error, regardless of eof-error-p, if the file ends in the middle of an object representation.
For example, if a file does not contain enough right parentheses to balance the left parentheses in it, read will complain. If a file ends in a symbol or a number immediately followed by end-of-file, read will read the symbol or number successfully and when called again will see the end-of-file and only then act according to eof-error-p.
Similarly, the function read-line will successfully read the last line of a file even if that line is terminated by end-of-file rather than the newline character. If a file contains ignorable text at the end, such as blank lines and comments, read will not consider it to end in the middle of an object. Thus an eof-error-p argument controls what happens when the file ends between objects.
The following functions allow the input from an arbitrary stream,and process their arguments as described above.
read
read-line
read-char
unread-char
22.3.1 Output to Character Streams
These functions all take an optional argument called output-stream, which is where to send the output. If unsupplied or nil, output-stream defaults to the value of the variable *standard-output*. If it is t, the value of the variable *terminal-io* is used.
write
prin1
pprint
princ
write-char
write-string
write-line
terpri
fresh-line
finish-output
force-output
22.3.2 Formatting Output to Character Streams
The function format is very useful for producing nicely formatted text, producing good-looking messages, and so on. format can generate a string or output to a stream. Formatted output is performed not only by the format function itself, but by certain other functions that accept a control string "the way format does." For example, error-signalling functions such as error accept format control strings.
format
22.4 Querying the User
The following functions provide a convenient and consistent interface for asking questions of the user.
y-or-n-p
yes-or-no-p