| Chapter 5 |
Advanced Compiler
Use and Efficiency Hints |
|
by Robert MacLachlan
| 5.1 |
Advanced Compiler
Introduction |
|
In CMUCL, as with any language on any computer, the path to
efficient code starts with good algorithms and sensible programming
techniques, but to avoid inefficiency pitfalls, you need to know
some of this implementation's quirks and features. This chapter is
mostly a fairly long and detailed overview of what optimizations
Python does. Although there are the usual negative suggestions of
inefficient features to avoid, the main emphasis is on describing
the things that programmers can count on being efficient.
The optimizations described here can have the effect of speeding up
existing programs written in conventional styles, but the potential
for new programming styles that are clearer and less error-prone is
at least as significant. For this reason, several sections end with
a discussion of the implications of these optimizations for
programming style.
Python's support for types is unusual in three major ways:
- Precise type checking encourages the specific use of type
declarations as a form of run-time consistency checking. This
speeds development by localizing type errors and giving more
meaningful error messages. See section 4.5.2. Python produces
completely safe code; optimized type checking maintains reasonable
efficiency on conventional hardware (see section 5.3.6.)
- Comprehensive support for the Common Lisp type system makes
complex type specifiers useful. Using type specifiers such as
or and member has both
efficiency and robustness advantages. See section 5.2.
- Type inference eliminates the need for some declarations, and
also aids compile-time detection of type errors. Given detailed
type declarations, type inference can often eliminate type checks
and enable more efficient object representations and code
sequences. Checking all types results in fewer type checks. See
sections 5.3 and 5.11.2.
The main barrier to efficient Lisp programs is not that there is no
efficient way to code the program in Lisp, but that it is difficult
to arrive at that efficient coding. Common Lisp is a highly complex
language, and usually has many semantically equivalent
``reasonable'' ways to code a given problem. It is desirable to
make all of these equivalent solutions have comparable efficiency
so that programmers don't have to waste time discovering the most
efficient solution.
Source level optimization increases the number of efficient ways to
solve a problem. This effect is much larger than the increase in
the efficiency of the ``best'' solution. Source level optimization
transforms the original program into a more efficient (but
equivalent) program. Although the optimizer isn't doing anything
the programmer couldn't have done, this high-level optimization is
important because:
- The programmer can code simply and directly, rather than
obfuscating code to please the compiler.
- When presented with a choice of similar coding alternatives,
the programmer can chose whichever happens to be most convenient,
instead of worrying about which is most efficient.
Source level optimization eliminates the need for macros to
optimize their expansion, and also increases the effectiveness of
inline expansion. See sections 5.4 and 5.8.
Efficient support for a safer programming style is the biggest
advantage of source level optimization. Existing tuned programs
typically won't benefit much from source optimization, since their
source has already been optimized by hand. However, even tuned
programs tend to run faster under Python because:
- Low level optimization and register allocation provides modest
speedups in any program.
- Block compilation and inline expansion can reduce function call
overhead, but may require some program restructuring. See sections
5.8, 5.6
and 5.7.
- Efficiency notes will point out important type declarations
that are often missed even in highly tuned programs. See
section 5.13.
- Existing programs can be compiled safely without prohibitive
speed penalty, although they would be faster and safer with added
declarations. See section 5.3.6.
- The context declaration mechanism allows both space and runtime
of large systems to be reduced without sacrificing robustness by
semi-automatically varying compilation policy without addition any
optimize declarations to the source. See
section 5.7.5.
- Byte compilation can be used to dramatically reduce the size of
code that is not speed-critical. See section 5.9
The sort of symbolic programs generally written in Common Lisp
often favor recursion over iteration, or have inner loops so
complex that they involve multiple function calls. Such programs
spend a larger fraction of their time doing function calls than is
the norm in other languages; for this reason Common Lisp
implementations strive to make the general (or full) function call
as inexpensive as possible. Python goes beyond this by providing
two good alternatives to full call:
- Local call resolves function references at compile time,
allowing better calling sequences and optimization across function
calls. See section 5.6.
- Inline expansion totally eliminates call overhead and allows
many context dependent optimizations. This provides a safe and
efficient implementation of operations with function semantics,
eliminating the need for error-prone macro definitions or manual
case analysis. Although most Common Lisp implementations support
inline expansion, it becomes a more powerful tool with Python's
source level optimization. See sections 5.4 and 5.8.
Generally, Python provides simple implementations for simple uses
of function call, rather than having only a single calling
convention. These features allow a more natural programming style:
- Proper tail recursion. See section 5.5
- Relatively efficient closures.
- A funcall that is as efficient as normal
named call.
- Calls to local functions such as from labels are optimized:
- Control transfer is a direct jump.
- The closure environment is passed in registers rather than heap
allocated.
- Keyword arguments and multiple values are implemented more
efficiently.
See section 5.6.
| 5.1.4 |
Representation of
Objects |
|
Sometimes traditional Common Lisp implementation techniques compare
so poorly to the techniques used in other languages that Common
Lisp can become an impractical language choice. Terrible
inefficiencies appear in number-crunching programs, since Common
Lisp numeric operations often involve number-consing and generic
arithmetic. Python supports efficient natural representations for
numbers (and some other types), and allows these efficient
representations to be used in more contexts. Python also provides
good efficiency notes that warn when a crucial declaration is
missing.
See section 5.11.2 for more about
object representations and numeric types. Also see
section 5.13 about efficiency
notes.
| 5.1.5 |
Writing Efficient
Code |
|
Writing efficient code that works is a complex and prolonged
process. It is important not to get so involved in the pursuit of
efficiency that you lose sight of what the original problem
demands. Remember that:
- The program should be correct---it doesn't matter how quickly
you get the wrong answer.
- Both the programmer and the user will make errors, so the
program must be robust---it must detect errors in a way that allows
easy correction.
- A small portion of the program will consume most of the
resources, with the bulk of the code being virtually irrelevant to
efficiency considerations. Even experienced programmers familiar
with the problem area cannot reliably predict where these ``hot
spots'' will be.
The best way to get efficient code that is still worth using, is to
separate coding from tuning. During coding, you should:
- Use a coding style that aids correctness and robustness without
being incompatible with efficiency.
- Choose appropriate data structures that allow efficient
algorithms and object representations (see section 5.10). Try to make interfaces abstract
enough so that you can change to a different representation if
profiling reveals a need.
- Whenever you make an assumption about a function argument or
global data structure, add consistency assertions, either with type
declarations or explicit uses of assert,
ecase, etc.
During tuning, you should:
- Identify the hot spots in the program through profiling
(section 5.14.)
- Identify inefficient constructs in the hot spot with efficiency
notes, more profiling, or manual inspection of the source. See
sections 5.12 and 5.13.
- Add declarations and consider the application of optimizations.
See sections 5.6, 5.8 and 5.11.2.
- If all else fails, consider algorithm or data structure
changes. If you did a good job coding, changes will be easy to
introduce.
| 5.2 |
More About Types
in Python |
|
This section goes into more detail describing what types and
declarations are recognized by Python. The area where Python
differs most radically from previous Common Lisp compilers is in
its support for types:
- Precise type checking helps to find bugs at run time.
- Compile-time type checking helps to find bugs at compile
time.
- Type inference minimizes the need for generic operations, and
also increases the efficiency of run time type checking and the
effectiveness of compile time type checking.
- Support for detailed types provides a wealth of opportunity for
operation-specific type inference and optimization.
| 5.2.1 |
More Types
Meaningful |
|
Common Lisp has a very powerful type system, but conventional
Common Lisp implementations typically only recognize the small set
of types special in that implementation. In these systems, there is
an unfortunate paradox: a declaration for a relatively general type
like fixnum will be recognized by the
compiler, but a highly specific declaration such as (integer 3 17) is totally ignored.
This is obviously a problem, since the user has to know how to
specify the type of an object in the way the compiler wants it. A
very minimal (but rarely satisfied) criterion for type system
support is that it be no worse to make a specific declaration than
to make a general one. Python goes beyond this by exploiting a
number of advantages obtained from detailed type information.
Using more restrictive types in declarations allows the compiler to
do better type inference and more compile-time type checking. Also,
when type declarations are considered to be consistency assertions
that should be verified (conditional on policy), then complex types
are useful for making more detailed assertions.
Python ``understands'' the list-style or,
member, function, array
and number type specifiers. Understanding means that:
- If the type contains more information than is used in a
particular context, then the extra information is simply ignored,
rather than derailing type inference.
- In many contexts, the extra information from these type
specifier is used to good effect. In particular, type checking in
Python is precise, so these complex types
can be used in declarations to make interesting assertions about
functions and data structures (see section 4.5.2.) More specific
declarations also aid type inference and reduce the cost for type
checking.
For related information, see section 5.11 for numeric types, and section 5.10.3 for array types.
When given a type specifier, Python will often rewrite it into a
different (but equivalent) type. This is the mechanism that Python
uses for detecting type equivalence. For example, in Python's
canonical representation, these types are equivalent:
(or list (member :end)) <==> (or cons (member nil :end))
This has two implications for the user:
- The standard symbol type specifiers for atom, null, fixnum, etc., are in no way magical. The null type is actually defined
to be (member nil), list is (or
cons null), and fixnum is (signed-byte 30).
- When the compiler prints out a type, it may not look like the
type specifier that originally appeared in the program. This is
generally not a problem, but it must be taken into consideration
when reading compiler error messages.
The member type
specifier can be used to represent ``symbolic'' values, analogous
to the enumerated types of Pascal. For example, the second value of
find-symbol has this type:
(member :internal :external :inherited nil)
Member types are very useful for expressing consistency constraints
on data structures, for example:
(defstruct ice-cream
(flavor :vanilla :type (member :vanilla :chocolate :strawberry)))
Member types are also useful in type inference, as the number of
members can sometimes be pared down to one, in which case the value
is a known constant.
The or (union) type
specifier is understood, and is meaningfully applied in many
contexts. The use of or allows assertions to
be made about types in dynamically typed programs. For example:
(defstruct box
(next nil :type (or box null))
(top :removed :type (or box-top (member :removed))))
The type assertion on the top slot ensures
that an error will be signaled when there is an attempt to store an
illegal value (such as :rmoved.) Although
somewhat weak, these union type assertions provide a useful input
into type inference, allowing the cost of type checking to be
reduced. For example, this loop is safely compiled with no type
checks:
(defun find-box-with-top (box)
(declare (type (or box null) box))
(do ((current box (box-next current)))
((null current))
(unless (eq (box-top current) :removed)
(return current))))
Union types are also useful in type inference for representing
types that are partially constrained. For example, the result of
this expression:
(if foo
(logior x y)
(list x y))
can be expressed as (or integer cons).
The type nil is also called the empty type,
since no object is of type nil. The union of
no types, (or), is also empty. Python's
interpretation of an expression whose type is nil is that the expression never yields any value, but
rather fails to terminate, or is thrown out of. For example, the
type of a call to error or a use of
return is nil. When the
type of an expression is empty, compile-time type warnings about
its value are suppressed; presumably somebody else is signaling an
error. If a function is declared to have return type nil, but does in fact return, then (in safe compilation
policies) a ``NIL Function returned'' error
will be signaled. See also the function required-argument.
function types are
understood in the restrictive sense, specifying:
- The argument syntax that the function must be called with. This
is information about what argument counts are acceptable, and which
keyword arguments are recognized. In Python, warnings about
argument syntax are a consequence of function type checking.
- The types of the argument values that the caller must pass. If
the compiler can prove that some argument to a call is of a type
disallowed by the called function's type, then it will give a
compile-time type warning. In addition to being used for
compile-time type checking, these type assertions are also used as
output type assertions in code generation. For example, if
foo is declared to have a fixnum argument, then the 1+ in
(foo (1+ x)) is compiled with knowledge that
the result must be a fixnum.
- The types the values that will be bound to argument variables
in the function's definition. Declaring a function's type with
ftype implicitly declares the types of the
arguments in the definition. Python checks for consistency between
the definition and the ftype declaration.
Because of precise type checking, an error will be signaled when a
function is called with an argument of the wrong type.
- The type of return value(s) that the caller can expect. This
information is a useful input to type inference. For example, if a
function is declared to return a fixnum, then
when a call to that function appears in an expression, the
expression will be compiled with knowledge that the call will
return a fixnum.
- The type of return value(s) that the definition must return.
The result type in an ftype declaration is
treated like an implicit the wrapped around
the body of the definition. If the definition returns a value of
the wrong type, an error will be signaled. If the compiler can
prove that the function returns the wrong type, then it will give a
compile-time warning.
This is consistent with the new interpretation of function types
and the ftype declaration in the proposed
X3J13 ``function-type-argument-type-semantics'' cleanup. Note also,
that if you don't explicitly declare the type of a function using a
global ftype declaration, then Python will
compute a function type from the definition, providing a degree of
inter-routine type inference, see section 5.3.3.
| 5.2.7 |
The Values
Declaration |
|
CMUCL supports the values declaration as an
extension to Common Lisp. The syntax of the declaration is
(values type1 type2...typen). This
declaration is semantically equivalent to a the form wrapped around the body of the special form in
which the values declaration appears. The
advantage of values over the is purely syntactic---it
doesn't introduce more indentation. For example:
(defun foo (x)
(declare (values single-float))
(ecase x
(:this ...)
(:that ...)
(:the-other ...)))
is equivalent to:
(defun foo (x)
(the single-float
(ecase x
(:this ...)
(:that ...)
(:the-other ...))))
and
(defun floor (number &optional (divisor 1))
(declare (values integer real))
...)
is equivalent to:
(defun floor (number &optional (divisor 1))
(the (values integer real)
...))
In addition to being recognized by lambda
(and hence by defun), the values declaration is recognized by all the other
special forms with bodies and declarations: let, let*, labels and flet. Macros with
declarations usually splice the declarations into one of the above
forms, so they will accept this declaration too, but the exact
effect of a values declaration will depend on
the macro.
If you declare the types of all arguments to a function, and also
declare the return value types with values,
you have described the type of the function. Python will use this
argument and result type information to derive a function type that
will then be applied to calls of the function (see
section 5.2.6.) This provides a
way to declare the types of functions that is much less
syntactically awkward than using the ftype
declaration with a function type
specifier.
Although the values declaration is
non-standard, it is relatively harmless to use it in otherwise
portable code, since any warning in non-CMU implementations can be
suppressed with the standard declaration
proclamation.
Because of precise type checking, structure types are much better
supported by Python than by conventional compilers:
- The structure argument to structure accessors is precisely
checked---if you call foo-a on a bar, an error will be signaled.
- The types of slot values are precisely checked---if you pass
the wrong type argument to a constructor or a slot setter, then an
error will be signaled.
This error checking is tremendously useful for detecting bugs in
programs that manipulate complex data structures.
An additional advantage of checking structure types and enforcing
slot types is that the compiler can safely believe slot type
declarations. Python effectively moves the type checking from the
slot access to the slot setter or constructor call. This is more
efficient since caller of the setter or constructor often knows the
type of the value, entirely eliminating the need to check the
value's type. Consider this example:
(defstruct coordinate
(x nil :type single-float)
(y nil :type single-float))
(defun make-it ()
(make-coordinate :x 1.0 :y 1.0))
(defun use-it (it)
(declare (type coordinate it))
(sqrt (expt (coordinate-x it) 2) (expt (coordinate-y it) 2)))
make-it and use-it are
compiled with no checking on the types of the float slots, yet
use-it can use single-float arithmetic with perfect safety. Note that
make-coordinate must still check the values
of x and y unless the
call is block compiled or inline expanded (see
section 5.6.) But even without this
advantage, it is almost always more efficient to check slot values
on structure initialization, since slots are usually written once
and read many times.
| 5.2.9 |
The Freeze-Type
Declaration |
|
The extensions:freeze-type declaration is a
CMUCL extension that enables more efficient compilation of
user-defined types by asserting that the definition is not going to
change. This declaration may only be used globally (with declaim or proclaim). Currently
freeze-type only affects structure type
testing done by typep, typecase, etc. Here is an example:
(declaim (freeze-type foo bar))
This asserts that the types foo and
bar and their subtypes are not going to
change. This allows more efficient type testing, since the compiler
can open-code a test for all possible subtypes, rather than having
to examine the type hierarchy at run-time.
Avoid use of the and, not and satisfies types in
declarations, since type inference has problems with them. When
these types do appear in a declaration, they are still checked
precisely, but the type information is of limited use to the
compiler. and types are effective as long as
the intersection can be canonicalized to a type that doesn't use
and. For example:
(and fixnum unsigned-byte)
is fine, since it is the same as:
(integer 0 most-positive-fixnum)
but this type:
(and symbol (not (member :end)))
will not be fully understood by type interference since the
and can't be removed by canonicalization.
Using any of these type specifiers in a type test with typep or typecase is fine, since
as tests, these types can be translated into the and macro, the not function or a
call to the satisfies predicate.
| 5.2.11 |
Type Style
Recommendations |
|
Python provides good support for some currently unconventional ways
of using the Common Lisp type system. With Python, it is desirable
to make declarations as precise as possible, but type inference
also makes some declarations unnecessary. Here are some general
guidelines for maximum robustness and efficiency:
- Declare the types of all function arguments and structure slots
as precisely as possible (while avoiding not,
and and satisfies). Put
these declarations in during initial coding so that type assertions
can find bugs for you during debugging.
- Use the member
type specifier where there are a small number of possible symbol
values, for example: (member :red :blue
:green).
- Use the or type
specifier in situations where the type is not certain, but there
are only a few possibilities, for example: (or
list vector).
- Declare integer types with the tightest bounds that you can,
such as (integer 3 7).
- Define deftype or
defstruct types before
they are used. Definition after use is legal (producing no
``undefined type'' warnings), but type tests and structure
operations will be compiled much less efficiently.
- Use the extensions:freeze-type
declaration to speed up type testing for structure types which
won't have new subtypes added later. See section 5.2.9
- In addition to declaring the array element type and simpleness,
also declare the dimensions if they are fixed, for example:
(simple-array single-float (1024 1024))
This bounds information allows array indexing for multi-dimensional
arrays to be compiled much more efficiently, and may also allow
array bounds checking to be done at compile time. See
section 5.10.3.
- Avoid use of the the declaration within expressions. Not only does it
clutter the code, but it is also almost worthless under safe
policies. If the need for an output type assertion is revealed by
efficiency notes during tuning, then you can consider the, but it is preferable to constrain the argument
types more, allowing the compiler to prove the desired result
type.
- Don't bother declaring the type of let or other non-argument
variables unless the type is non-obvious. If you declare function
return types and structure slot types, then the type of a variable
is often obvious both to the programmer and to the compiler. An
important case where the type isn't obvious, and a declaration is
appropriate, is when the value for a variable is pulled out of
untyped structure (e.g., the result of car),
or comes from some weakly typed function, such as read.
- Declarations are sometimes necessary for integer loop
variables, since the compiler can't always prove that the value is
of a good integer type. These declarations are best added during
tuning, when an efficiency note indicates the need.
Type inference is the process by which the compiler tries to figure
out the types of expressions and variables, given an inevitable
lack of complete type information. Although Python does much more
type inference than most Common Lisp compilers, remember that the
more precise and comprehensive type declarations are, the more type
inference will be able to do.
| 5.3.1 |
Variable Type
Inference |
|
The type of a variable is the union of the types of all the
definitions. In the degenerate case of a let, the type of the
variable is the type of the initial value. This inferred type is
intersected with any declared type, and is then propagated to all
the variable's references. The types of multiple-value-bind variables
are similarly inferred from the types of the individual values of
the values form.
If multiple type declarations apply to a single variable, then all
the declarations must be correct; it is as though all the types
were intersected producing a single and type specifier. In this
example:
(defmacro my-dotimes ((var count) &body body)
`(do ((,var 0 (1+ ,var)))
((>= ,var ,count))
(declare (type (integer 0 *) ,var))
,@body))
(my-dotimes (i ...)
(declare (fixnum i))
...)
the two declarations for i are intersected,
so i is known to be a non-negative
fixnum.
In practice, this type inference is limited to lets and local
functions, since the compiler can't analyze all the calls to a
global function. But type inference works well enough on local
variables so that it is often unnecessary to declare the type of
local variables. This is especially likely when function result
types and structure slot types are declared. The main areas where
type inference breaks down are:
- When the initial value of a variable is a untyped expression,
such as (car x), and
- When the type of one of the variable's definitions is a
function of the variable's current value, as in: (setq x (1+ x))
| 5.3.2 |
Local Function
Type Inference |
|
The types of arguments to local functions are inferred in the same
was as any other local variable; the type is the union of the
argument types across all the calls to the function, intersected
with the declared type. If there are any assignments to the
argument variables, the type of the assigned value is unioned in as
well.
The result type of a local function is computed in a special way
that takes tail recursion (see section 5.5) into consideration. The result type is
the union of all possible return values that aren't tail-recursive
calls. For example, Python will infer that the result type of this
function is integer:
(defun ! (n res)
(declare (integer n res))
(if (zerop n)
res
(! (1- n) (* n res))))
Although this is a rather obvious result, it becomes somewhat less
trivial in the presence of mutual tail recursion of multiple
functions. Local function result type inference interacts with the
mechanisms for ensuring proper tail recursion mentioned in section
5.6.5.
| 5.3.3 |
Global Function
Type Inference |
|
As described in section 5.2.6, a
global function type (ftype) declaration places implicit type assertions on
the call arguments, and also guarantees the type of the return
value. So wherever a call to a declared function appears, there is
no doubt as to the types of the arguments and return value.
Furthermore, Python will infer a function type from the function's
definition if there is no ftype declaration.
Any type declarations on the argument variables are used as the
argument types in the derived function type, and the compiler's
best guess for the result type of the function is used as the
result type in the derived function type.
This method of deriving function types from the definition
implicitly assumes that functions won't be redefined at run-time.
Consider this example:
(defun foo-p (x)
(let ((res (and (consp x) (eq (car x) 'foo))))
(format t "It is ~:[not ~;~]foo." res)))
(defun frob (it)
(if (foo-p it)
(setf (cadr it) 'yow!)
(1+ it)))
Presumably, the programmer really meant to return res from foo-p, but he seems to
have forgotten. When he tries to call do (frob
(list 'foo nil)), frob will flame out
when it tries to add to a cons. Realizing his
error, he fixes foo-p and recompiles it. But
when he retries his test case, he is baffled because the error is
still there. What happened in this example is that Python proved
that the result of foo-p is null, and then proceeded to optimize away the
setf in frob.
Fortunately, in this example, the error is detected at compile time
due to notes about unreachable code (see section 5.4.5.) Still, some users may not want to
worry about this sort of problem during incremental development, so
there is a variable to control deriving function types.
[Variable]
extensions:*derive-function-types*
If true (the default), argument and result type information derived
from compilation of defuns is used when
compiling calls to that function. If false, only information from
ftype proclamations will be
used.
| 5.3.4 |
Operation
Specific Type Inference |
|
Many of the standard Common Lisp functions have special type
inference procedures that determine the result type as a function
of the argument types. For example, the result type of aref is the array element type. Here are some other
examples of type inferences:
(logand x #xFF) ==> (unsigned-byte 8)
(+ (the (integer 0 12) x) (the (integer 0 1) y)) ==> (integer 0 13)
(ash (the (unsigned-byte 16) x) -8) ==> (unsigned-byte 8)
| 5.3.5 |
Dynamic Type
Inference |
|
Python uses flow analysis to infer types in dynamically typed
programs. For example:
(ecase x
(list (length x))
...)
Here, the compiler knows the argument to length is a list, because the call to length is only done when x is a
list. The most significant efficiency effect of inference from
assertions is usually in type check optimization.
Dynamic type inference has two inputs: explicit conditionals and
implicit or explicit type assertions. Flow analysis propagates
these constraints on variable type to any code that can be executed
only after passing though the constraint. Explicit type constraints
come from ifs where
the test is either a lexical variable or a function of lexical
variables and constants, where the function is either a type
predicate, a numeric comparison or eq.
If there is an eq (or eql) test, then the compiler will actually substitute
one argument for the other in the true branch. For example:
(when (eq x :yow!) (return x))
becomes:
(when (eq x :yow!) (return :yow!))
This substitution is done when one argument is a constant, or one
argument has better type information than the other. This
transformation reveals opportunities for constant folding or
type-specific optimizations. If the test is against a constant,
then the compiler can prove that the variable is not that constant
value in the false branch, or (not (member
:yow!)) in the example above. This can eliminate redundant
tests, for example:
(if (eq x nil)
...
(if x a b))
is transformed to this:
(if (eq x nil)
...
a)
Variables appearing as if tests are
interpreted as (not (eq var nil)) tests. The compiler also converts
= into eql where
possible. It is difficult to do inference directly on = since it does implicit coercions.
When there is an explicit or test on numeric variables, the
compiler makes inferences about the ranges the variables can assume
in the true and false branches. This is mainly useful when it
proves that the values are small enough in magnitude to allow
open-coding of arithmetic operations. For example, in many uses of
dotimes with a fixnum
repeat count, the compiler proves that fixnum arithmetic can be
used.
Implicit type assertions are quite common, especially if you
declare function argument types. Dynamic inference from implicit
type assertions sometimes helps to disambiguate programs to a
useful degree, but is most noticeable when it detects a dynamic
type error. For example:
(defun foo (x)
(+ (car x) x))
results in this warning:
In: DEFUN FOO
(+ (CAR X) X)
==>
X
Warning: Result is a LIST, not a NUMBER.
Note that Common Lisp's dynamic type checking semantics make
dynamic type inference useful even in programs that aren't really
dynamically typed, for example:
(+ (car x) (length x))
Here, x presumably always holds a list, but
in the absence of a declaration the compiler cannot assume
x is a list simply because list-specific
operations are sometimes done on it. The compiler must consider the
program to be dynamically typed until it proves otherwise. Dynamic
type inference proves that the argument to length is always a list because the call to length is only done after the list-specific car operation.
| 5.3.6 |
Type Check
Optimization |
|
Python backs up its support for precise type checking by minimizing
the cost of run-time type checking. This is done both through type
inference and though optimizations of type checking itself.
Type inference often allows the compiler to prove that a value is
of the correct type, and thus no type check is necessary. For
example:
(defstruct foo a b c)
(defstruct link
(foo (required-argument) :type foo)
(next nil :type (or link null)))
(foo-a (link-foo x))
Here, there is no need to check that the result of link-foo is a foo, since it
always is. Even when some type checks are necessary, type inference
can often reduce the number:
(defun test (x)
(let ((a (foo-a x))
(b (foo-b x))
(c (foo-c x)))
...))
In this example, only one (foo-p x) check is
needed. This applies to a lesser degree in list operations, such
as:
(if (eql (car x) 3) (cdr x) y)
Here, we only have to check that x is a list
once.
Since Python recognizes explicit type tests, code that explicitly
protects itself against type errors has little introduced overhead
due to implicit type checking. For example, this loop compiles with
no implicit checks checks for car and
cdr:
(defun memq (e l)
(do ((current l (cdr current)))
((atom current) nil)
(when (eq (car current) e) (return current))))
Python reduces the cost of checks that
must be done through an optimization called complementing. A complemented check for type is simply a check that the value is not of the
type (not type).
This is only interesting when something is known about the actual
type, in which case we can test for the complement of (and known-type (not type)), or the difference between the known
type and the assertion. An example:
(link-foo (link-next x))
Here, we change the type check for link-foo
from a test for foo to a test for:
(not (and (or foo null) (not foo)))
or more simply (not null). This is probably
the most important use of complementing, since the situation is
fairly common, and a null test is much
cheaper than a structure type test.
Here is a more complicated example that illustrates the combination
of complementing with dynamic type inference:
(defun find-a (a x)
(declare (type (or link null) x))
(do ((current x (link-next current)))
((null current) nil)
(let ((foo (link-foo current)))
(when (eq (foo-a foo) a) (return foo)))))
This loop can be compiled with no type checks. The link test for link-foo and
link-next is complemented to (not null), and then deleted because of the explicit
null test. As before, no check is necessary
for foo-a, since the link-foo is always a foo. This
sort of situation shows how precise type checking combined with
precise declarations can actually result in reduced type
checking.
This section describes source-level transformations that Python
does on programs in an attempt to make them more efficient.
Although source-level optimizations can make existing programs more
efficient, the biggest advantage of this sort of optimization is
that it makes it easier to write efficient programs. If a clean,
straightforward implementation is can be transformed into an
efficient one, then there is no need for tricky and dangerous hand
optimization.
The primary optimization of let variables is to delete them when
they are unnecessary. Whenever the value of a let variable is a
constant, a constant variable or a constant (local or
non-notinline) function, the variable is deleted, and references to
the variable are replaced with references to the constant
expression. This is useful primarily in the expansion of macros or
inline functions, where argument values are often constant in any
given call, but are in general non-constant expressions that must
be bound to preserve order of evaluation. Let variable optimization
eliminates the need for macros to carefully avoid spurious
bindings, and also makes inline functions just as efficient as
macros.
A particularly interesting class of constant is a local function.
Substituting for lexical variables that are bound to a function can
substantially improve the efficiency of functional programming
styles, for example:
(let ((a #'(lambda (x) (zow x))))
(funcall a 3))
effectively transforms to:
(zow 3)
This transformation is done even when the function is a closure, as
in:
(let ((a (let ((y (zug)))
#'(lambda (x) (zow x y)))))
(funcall a 3))
becoming:
(zow 3 (zug))
A constant variable is a lexical variable that is never assigned
to, always keeping its initial value. Whenever possible, avoid
setting lexical variables---instead bind a new variable to the new
value. Except for loop variables, it is almost always possible to
avoid setting lexical variables. This form:
(let ((x (f x)))
...)
is more efficient than this form:
(setq x (f x))
...
Setting variables makes the program more difficult to understand,
both to the compiler and to the programmer. Python compiles
assignments at least as efficiently as any other Common Lisp
compiler, but most let optimizations are only done on constant
variables.
Constant variables with only a single use are also optimized away,
even when the initial value is not constant.1 For
example, this expansion of incf:
(let ((#:g3 (+ x 1)))
(setq x #:G3))
becomes:
(setq x (+ x 1))
The type semantics of this transformation are more important than
the elimination of the variable itself. Consider what happens when
x is declared to be a fixnum; after the transformation, the compiler can
compile the addition knowing that the result is a fixnum, whereas before the transformation the addition
would have to allow for fixnum overflow.
Another variable optimization deletes any variable that is never
read. This causes the initial value and any assigned values to be
unused, allowing those expressions to be deleted if they have no
side-effects.
Note that a let is actually a degenerate case of local call (see
section 5.6.2), and that let
optimization can be done on calls that weren't created by a let.
Also, local call allows an applicative style of iteration that is
totally assignment free.
Constant folding is an optimization that replaces a call of
constant arguments with the constant result of that call. Constant
folding is done on all standard functions for which it is legal.
Inline expansion allows folding of any constant parts of the
definition, and can be done even on functions that have
side-effects.
It is convenient to rely on constant folding when programming, as
in this example:
(defconstant limit 42)
(defun foo ()
(... (1- limit) ...))
Constant folding is also helpful when writing macros or inline
functions, since it usually eliminates the need to write a macro
that special-cases constant arguments.
Constant folding of a user defined
function is enabled by the extensions:constant-function proclamation. In this
example:
(declaim (ext:constant-function myfun))
(defun myexp (x y)
(declare (single-float x y))
(exp (* (log x) y)))
... (myexp 3.0 1.3) ...
The call to myexp is constant-folded to
4.1711674.
| 5.4.3 |
Unused Expression
Elimination |
|
If the value of any expression is not used, and the expression has
no side-effects, then it is deleted. As with constant folding, this
optimization applies most often when cleaning up after inline
expansion and other optimizations. Any function declared an
extensions:constant-function is also subject
to unused expression elimination.
Note that Python will eliminate parts of unused expressions known
to be side-effect free, even if there are other unknown parts. For
example:
(let ((a (list (foo) (bar))))
(if t
(zow)
(raz a)))
becomes:
(progn (foo) (bar))
(zow)
| 5.4.4 |
Control
Optimization |
|
The most important optimization of control is recognizing when an
if test is known at
compile time, then deleting the if, the test
expression, and the unreachable branch of the if. This can be considered a special case of constant
folding, although the test doesn't have to be truly constant as
long as it is definitely not nil. Note also,
that type inference propagates the result of an if test to the true and false branches, see
section 5.3.5.
A related if optimization is this
transformation:2
(if (if a b c) x y)
into:
(if a
(if b x y)
(if c x y))
The opportunity for this sort of optimization usually results from
a conditional macro. For example:
(if (not a) x y)
is actually implemented as this:
(if (if a nil t) x y)
which is transformed to this:
(if a
(if nil x y)
(if t x y))
which is then optimized to this:
(if a y x)
Note that due to Python's internal representations, the if---if situation will be
recognized even if other forms are wrapped around the inner
if, like:
(if (let ((g ...))
(loop
...
(return (not g))
...))
x y)
In Python, all the Common Lisp macros really are macros, written in
terms of if, block and
tagbody, so user-defined control macros can
be just as efficient as the standard ones. Python emits basic
blocks using a heuristic that minimizes the number of unconditional
branches. The code in a tagbody will not be
emitted in the order it appeared in the source, so there is no
point in arranging the code to make control drop through to the
target.
| 5.4.5 |
Unreachable Code
Deletion |
|
Python will delete code whenever it can prove that the code can
never be executed. Code becomes unreachable when:
- An if is optimized away, or
- There is an explicit unconditional control transfer such as
go or return-from,
or
- The last reference to a local function is deleted (or there
never was any reference.)
When code that appeared in the original source is deleted, the
compiler prints a note to indicate a possible problem (or at least
unnecessary code.) For example:
(defun foo ()
(if t
(write-line "True.")
(write-line "False.")))
will result in this note:
In: DEFUN FOO
(WRITE-LINE "False.")
Note: Deleting unreachable code.
It is important to pay attention to unreachable code notes, since
they often indicate a subtle type error. For example:
(defstruct foo a b)
(defun lose (x)
(let ((a (foo-a x))
(b (if x (foo-b x) :none)))
...))
results in this note:
In: DEFUN LOSE
(IF X (FOO-B X) :NONE)
==>
:NONE
Note: Deleting unreachable code.
The :none is unreachable, because type
inference knows that the argument to foo-a
must be a foo, and thus can't be nil. Presumably the programmer forgot that x could be nil when he wrote the
binding for a.
Here is an example with an incorrect declaration:
(defun count-a (string)
(do ((pos 0 (position #\a string :start (1+ pos)))
(count 0 (1+ count)))
((null pos) count)
(declare (fixnum pos))))
This time our note is:
In: DEFUN COUNT-A
(DO ((POS 0 #) (COUNT 0 #))
((NULL POS) COUNT)
(DECLARE (FIXNUM POS)))
--> BLOCK LET TAGBODY RETURN-FROM PROGN
==>
COUNT
Note: Deleting unreachable code.
The problem here is that pos can never be
null since it is declared a fixnum.
It takes some experience with unreachable code notes to be able to
tell what they are trying to say. In non-obvious cases, the best
thing to do is to call the function in a way that should cause the
unreachable code to be executed. Either you will get a type error,
or you will find that there truly is no way for the code to be
executed.
Not all unreachable code results in a note:
- A note is only given when the unreachable code textually
appears in the original source. This prevents spurious notes due to
the optimization of macros and inline functions, but sometimes also
foregoes a note that would have been useful.
- Since accurate source information is not available for non-list
forms, there is an element of heuristic in determining whether or
not to give a note about an atom. Spurious notes may be given when
a macro or inline function defines a variable that is also present
in the calling function. Notes about nil and
t are never given, since it is too easy to
confuse these constants in expanded code with ones in the original
source.
- Notes are only given about code unreachable due to control
flow. There is no note when an expression is deleted because its
value is unused, since this is a common consequence of other
optimizations.
Somewhat spurious unreachable code notes can also result when a
macro inserts multiple copies of its arguments in different
contexts, for example:
(defmacro t-and-f (var form)
`(if ,var ,form ,form))
(defun foo (x)
(t-and-f x (if x "True." "False.")))
results in these notes:
In: DEFUN FOO
(IF X "True." "False.")
==>
"False."
Note: Deleting unreachable code.
==>
"True."
Note: Deleting unreachable code.
It seems like it has deleted both branches of the if, but it has really deleted one branch in one copy,
and the other branch in the other copy. Note that these messages
are only spurious in not satisfying the intent of the rule that
notes are only given when the deleted code appears in the original
source; there is always some code being
deleted when a unreachable code note is printed.
| 5.4.6 |
Multiple Values
Optimization |
|
Within a function, Python implements uses of multiple values
particularly efficiently. Multiple values can be kept in arbitrary
registers, so using multiple values doesn't imply stack
manipulation and representation conversion. For example, this code:
(let ((a (if x (foo x) u))
(b (if x (bar x) v)))
...)
is actually more efficient written this way:
(multiple-value-bind
(a b)
(if x
(values (foo x) (bar x))
(values u v))
...)
Also, see section 5.6.5 for
information on how local call provides efficient support for
multiple function return values.
| 5.4.7 |
Source to Source
Transformation |
|
The compiler implements a number of operation-specific
optimizations as source-to-source transformations. You will often
see unfamiliar code in error messages, for example:
(defun my-zerop () (zerop x))
gives this warning:
In: DEFUN MY-ZEROP
(ZEROP X)
==>
(= X 0)
Warning: Undefined variable: X
The original zerop has been transformed into
a call to =. This transformation is indicated
with the same ==> used to mark macro and
function inline expansion. Although it can be confusing, display of
the transformed source is important, since warnings are given with
respect to the transformed source. This a more obscure example:
(defun foo (x) (logand 1 x))
gives this efficiency note:
In: DEFUN FOO
(LOGAND 1 X)
==>
(LOGAND C::Y C::X)
Note: Forced to do static-function Two-arg-and (cost 53).
Unable to do inline fixnum arithmetic (cost 1) because:
The first argument is a INTEGER, not a FIXNUM.
etc.
Here, the compiler commuted the call to logand, introducing temporaries. The note complains
that the first argument is not a
fixnum, when in the original call, it was the
second argument. To make things more confusing, the compiler
introduced temporaries called c::x and
c::y that are bound to y and 1, respectively.
You will also notice source-to-source optimizations when efficiency
notes are enabled (see section 5.13.) When the compiler is unable to do a
transformation that might be possible if there was more
information, then an efficiency note is printed. For example,
my-zerop above will also give this efficiency
note:
In: DEFUN FOO
(ZEROP X)
==>
(= X 0)
Note: Unable to optimize because:
Operands might not be the same type, so can't open code.
| 5.4.8 |
Style
Recommendations |
|
Source level optimization makes possible a clearer and more relaxed
programming style:
- Don't use macros purely to avoid function call. If you want an
inline function, write it as a function and declare it inline. It's
clearer, less error-prone, and works just as well.
- Don't write macros that try to ``optimize'' their expansion in
trivial ways such as avoiding binding variables for simple
expressions. The compiler does these optimizations too, and is less
likely to make a mistake.
- Make use of local functions (i.e., labels
or flet) and tail-recursion in places where
it is clearer. Local function call is faster than full call.
- Avoid setting local variables when possible. Binding a new
let variable is at least as efficient as
setting an existing variable, and is easier to understand, both for
the compiler and the programmer.
- Instead of writing similar code over and over again so that it
can be hand customized for each use, define a macro or inline
function, and let the compiler do the work.
A call is tail-recursive if nothing has to be done after the the
call returns, i.e. when the call returns, the returned value is
immediately returned from the calling function. In this example,
the recursive call to myfun is
tail-recursive:
(defun myfun (x)
(if (oddp (random x))
(isqrt x)
(myfun (1- x))))
Tail recursion is interesting because it is form of recursion that
can be implemented much more efficiently than general recursion. In
general, a recursive call requires the compiler to allocate storage
on the stack at run-time for every call that has not yet returned.
This memory consumption makes recursion unacceptably inefficient
for representing repetitive algorithms having large or unbounded
size. Tail recursion is the special case of recursion that is
semantically equivalent to the iteration constructs normally used
to represent repetition in programs. Because tail recursion is
equivalent to iteration, tail-recursive programs can be compiled as
efficiently as iterative programs.
So why would you want to write a program recursively when you can
write it using a loop? Well, the main answer is that recursion is a
more general mechanism, so it can express some solutions simply
that are awkward to write as a loop. Some programmers also feel
that recursion is a stylistically preferable way to write loops
because it avoids assigning variables. For example, instead of
writing:
(defun fun1 (x)
something-that-uses-x)
(defun fun2 (y)
something-that-uses-y)
(do ((x something (fun2 (fun1 x))))
(nil))
You can write:
(defun fun1 (x)
(fun2 something-that-uses-x))
(defun fun2 (y)
(fun1 something-that-uses-y))
(fun1 something)
The tail-recursive definition is actually more efficient, in
addition to being (arguably) clearer. As the number of functions
and the complexity of their call graph increases, the simplicity of
using recursion becomes compelling. Consider the advantages of
writing a large finite-state machine with separate tail-recursive
functions instead of using a single huge prog.
It helps to understand how to use tail recursion if you think of a
tail-recursive call as a psetq that assigns
the argument values to the called function's variables, followed by
a go to the start of the called function.
This makes clear an inherent efficiency advantage of tail-recursive
call: in addition to not having to allocate a stack frame, there is
no need to prepare for the call to return (e.g., by computing a
return PC.)
Is there any disadvantage to tail recursion? Other than an increase
in efficiency, the only way you can tell that a call has been
compiled tail-recursively is if you use the debugger. Since a
tail-recursive call has no stack frame, there is no way the
debugger can print out the stack frame representing the call. The
effect is that backtrace will not show some calls that would have
been displayed in a non-tail-recursive implementation. In practice,
this is not as bad as it sounds---in fact it isn't really clearly
worse, just different. See section 3.3.5 for information
about the debugger implications of tail recursion, and how to turn
it off for the sake of more conservative backtrace information.
In order to ensure that tail-recursion is preserved in arbitrarily
complex calling patterns across separately compiled functions, the
compiler must compile any call in a tail-recursive position as a
tail-recursive call. This is done regardless of whether the program
actually exhibits any sort of recursive calling pattern. In this
example, the call to fun2 will always be
compiled as a tail-recursive call:
(defun fun1 (x)
(fun2 x))
So tail recursion doesn't necessarily have anything to do with
recursion as it is normally thought of. See section 5.6.4 for more discussion of using tail
recursion to implement loops.
| 5.5.1 |
Tail Recursion
Exceptions |
|
Although Python is claimed to be ``properly'' tail-recursive, some
might dispute this, since there are situations where tail recursion
is inhibited:
- When the call is enclosed by a special binding, or
- When the call is enclosed by a catch or
unwind-protect, or
- When the call is enclosed by a block or
tagbody and the block name or go tag has been closed over.
These dynamic extent binding forms inhibit tail recursion because
they allocate stack space to represent the binding. Shallow-binding
implementations of dynamic scoping also require cleanup code to be
evaluated when the scope is exited.
In addition, optimization of tail-recursive calls is inhibited when
the debug optimization quality is greater
than 2 (see section 3.6.)
Python supports two kinds of function call: full call and local
call. Full call is the standard calling convention; its late
binding and generality make Common Lisp what it is, but create
unavoidable overheads. When the compiler can compile the calling
function and the called function simultaneously, it can use local
call to avoid some of the overhead of full call. Local call is
really a collection of compilation strategies. If some aspect of
call overhead is not needed in a particular local call, then it can
be omitted. In some cases, local call can be totally free. Local
call provides two main advantages to the user:
- Local call makes the use of the lexical function binding forms
flet and labels much more efficient. A
local call is always faster than a full call, and in many cases is
much faster.
- Local call is a natural approach to block compilation, a
compilation technique that resolves function references at compile
time. Block compilation speeds function call, but increases
compilation times and prevents function redefinition.
| 5.6.1 |
Self-Recursive
Calls |
|
Local call is used when a function defined by defun calls itself. For example:
(defun fact (n)
(if (zerop n)
1
(* n (fact (1- n)))))
This use of local call speeds recursion, but can also complicate
debugging, since trace
will only show the first call to fact, and
not the recursive calls. This is because the recursive calls
directly jump to the start of the function, and don't indirect
through the symbol-function. Self-recursive
local call is inhibited when the :block-compile argument to compile-file is nil (see
section 5.7.3.)
Because local call avoids
unnecessary call overheads, the compiler internally uses local call
to implement some macros and special forms that are not normally
thought of as involving a function call. For example, this
let:
(let ((a (foo))
(b (bar)))
...)
is internally represented as though it was macroexpanded into:
(funcall #'(lambda (a b)
...)
(foo)
(bar))
This implementation is acceptable because the simple cases of local
call (equivalent to a let) result in good
code. This doesn't make let any more
efficient, but does make local calls that are semantically the same
as let much more efficient than full calls.
For example, these definitions are all the same as far as the
compiler is concerned:
(defun foo ()
...some other stuff...
(let ((a something))
...some stuff...))
(defun foo ()
(flet ((localfun (a)
...some stuff...))
...some other stuff...
(localfun something)))
(defun foo ()
(let ((funvar #'(lambda (a)
...some stuff...)))
...some other stuff...
(funcall funvar something)))
Although local call is most efficient when the function is called
only once, a call doesn't have to be equivalent to a let to be more efficient than full call. All local
calls avoid the overhead of argument count checking and keyword
argument parsing, and there are a number of other advantages that
apply in many common situations. See section 5.4.1 for a discussion of the optimizations
done on let calls.
Local call allows for much more efficient use of closures, since
the closure environment doesn't need to be allocated on the heap,
or even stored in memory at all. In this example, there is no
penalty for localfun referencing a and b:
(defun foo (a b)
(flet ((localfun (x)
(1+ (* a b x))))
(if (= a b)
(localfun (- x))
(localfun x))))
In local call, the compiler effectively passes closed-over values
as extra arguments, so there is no need for you to ``optimize''
local function use by explicitly passing in lexically visible
values. Closures may also be subject to let optimization (see
section 5.4.1.)
Note: indirect value cells are currently always allocated on the
heap when a variable is both assigned to (with setq or setf) and closed over,
regardless of whether the closure is a local function or not. This
is another reason to avoid setting variables when you don't have
to.
| 5.6.4 |
Local Tail
Recursion |
|
Tail-recursive local calls are particularly efficient, since they
are in effect an assignment plus a control transfer. Scheme
programmers write loops with tail-recursive local calls, instead of
using the imperative go and setq. This has not caught on in the Common Lisp
community, since conventional Common Lisp compilers don't implement
local call. In Python, users can choose to write loops such as:
(defun ! (n)
(labels ((loop (n total)
(if (zerop n)
total
(loop (1- n) (* n total)))))
(loop n 1)))
[Macro]
extensions:iterate name
({(var initial-value)}*) {declaration}*
{form}*
This macro provides syntactic sugar for using labels to do iteration. It
creates a local function name with the
specified vars as its arguments and the
declarations and forms as its body. This function is then called
with the initial-values, and the result
of the call is return from the macro.
Here is our factorial example rewritten using iterate:
(defun ! (n)
(iterate loop
((n n)
(total 1))
(if (zerop n)
total
(loop (1- n) (* n total)))))
The main advantage of using iterate over
do is that iterate
naturally allows stepping to be done differently depending on
conditionals in the body of the loop. iterate
can also be used to implement algorithms that aren't really
iterative by simply doing a non-tail call. For example, the
standard recursive definition of factorial can be written like
this:
(iterate fact
((n n))
(if (zerop n)
1
(* n (fact (1- n)))))
One of the more subtle costs of full call comes from allowing
arbitrary numbers of return values. This overhead can be avoided in
local calls to functions that always return the same number of
values. For efficiency reasons (as well as stylistic ones), you
should write functions so that they always return the same number
of values. This may require passing extra nil
arguments to values in some cases, but the
result is more efficient, not less so.
When efficiency notes are enabled (see section 5.13), and the compiler wants to use known
values return, but can't prove that the function always returns the
same number of values, then it will print a note like this:
In: DEFUN GRUE
(DEFUN GRUE (X) (DECLARE (FIXNUM X)) (COND (# #) (# NIL) (T #)))
Note: Return type not fixed values, so can't use known return convention:
(VALUES (OR (INTEGER -536870912 -1) NULL) &REST T)
In order to implement proper tail recursion in the presence of
known values return (see section 5.5), the compiler sometimes must prove that
multiple functions all return the same number of values. When this
can't be proven, the compiler will print a note like this:
In: DEFUN BLUE
(DEFUN BLUE (X) (DECLARE (FIXNUM X)) (COND (# #) (# #) (# #) (T #)))
Note: Return value count mismatch prevents known return from
these functions:
BLUE
SNOO
See section 5.11.10 for the
interaction between local call and the representation of numeric
types.
Block compilation allows calls to global functions defined by
defun to be compiled
as local calls. The function call can be in a different top-level
form than the defun, or even in a different
file.
In addition, block compilation allows the declaration of the
entry points to the block compiled portion. An entry point
is any function that may be called from outside of the block
compilation. If a function is not an entry point, then it can be
compiled more efficiently, since all calls are known at compile
time. In particular, if a function is only called in one place,
then it will be let converted. This effectively inline expands the
function, but without the code duplication that results from
defining the function normally and then declaring it inline.
The main advantage of block compilation is that it it preserves
efficiency in programs even when (for readability and syntactic
convenience) they are broken up into many small functions. There is
absolutely no overhead for calling a non-entry point function that
is defined purely for modularity (i.e. called only in one
place.)
Block compilation also allows the use of non-descriptor arguments
and return values in non-trivial programs (see
section 5.11.10).
| 5.7.1 |
Block Compilation
Semantics |
|
The effect of block compilation can be envisioned as the compiler
turning all the defuns in the block
compilation into a single labels form:
(declaim (start-block fun1 fun3))
(defun fun1 ()
...)
(defun fun2 ()
...
(fun1)
...)
(defun fun3 (x)
(if x
(fun1)
(fun2)))
(declaim (end-block))
becomes:
(labels ((fun1 ()
...)
(fun2 ()
...
(fun1)
...)
(fun3 (x)
(if x
(fun1)
(fun2))))
(setf (fdefinition 'fun1) #'fun1)
(setf (fdefinition 'fun3) #'fun3))
Calls between the block compiled functions are local calls, so
changing the global definition of fun1 will
have no effect on what fun2 does; fun2 will keep calling the old fun1.
The entry points fun1 and fun3 are still installed in the symbol-function as the global definitions of the
functions, so a full call to an entry point works just as before.
However, fun2 is not an entry point, so it is
not globally defined. In addition, fun2 is
only called in one place, so it will be let converted.
| 5.7.2 |
Block Compilation
Declarations |
|
The extensions:start-block and extensions:end-block declarations allow fine-grained
control of block compilation. These declarations are only legal as
a global declarations (declaim or proclaim).
The start-block declaration has this syntax:
(start-block {entry-point-name}*)
When processed by the compiler, this declaration marks the start of
block compilation, and specifies the entry points to that block. If
no entry points are specified, then all
functions are made into entry points. If already block compiling,
then the compiler ends the current block and starts a new one.
The end-block declaration has no arguments:
(end-block)
The end-block declaration ends a block
compilation unit without starting a new one. This is useful mainly
when only a portion of a file is worth block compiling.
The :block-compile and :entry-points arguments to extensions:compile-from-stream and compile-file provide overall
control of block compilation, and allow block compilation without
requiring modification of the program source.
There are three possible values of the :block-compile argument:
- nil
- Do no compile-time resolution of global function names, not
even for self-recursive calls. This inhibits any start-block declarations appearing in the file,
allowing all functions to be incrementally redefined.
- t
- Start compiling in block compilation mode. This is mainly
useful for block compiling small files that contain no start-block declarations. See also the :entry-points argument.
- :specified
- Start compiling in form-at-a-time mode, but exploit any
start-block declarations and compile
self-recursive calls as local calls. Normally :specified is the default for this argument (see
*block-compile-default*.)
The :entry-points argument can be used in
conjunction with :block-compile t to specify the entry-points to a block-compiled file.
If not specified or nil, all global functions
will be compiled as entry points. When :block-compile is not t, this
argument is ignored.
[Variable]
*block-compile-default*
This variable determines the default value for the :block-compile argument to compile-file and compile-from-stream. The initial value of this variable
is :specified, but nil
is sometimes useful for totally inhibiting block
compilation.
| 5.7.4 |
Practical
Difficulties |
|
The main problem with block compilation is that the compiler uses
large amounts of memory when it is block compiling. This places an
upper limit on the amount of code that can be block compiled as a
unit. To make best use of block compilation, it is necessary to
locate the parts of the program containing many internal calls, and
then add the appropriate start-block
declarations. When writing new code, it is a good idea to put in
block compilation declarations from the very beginning, since
writing block declarations correctly requires accurate knowledge of
the program's function call structure. If you want to initially
develop code with full incremental redefinition, you can compile
with *block-compile-default* set to nil.
Note if a defun appears in a non-null lexical
environment, then calls to it cannot be block compiled.
Unless files are very small, it is probably impractical to block
compile multiple files as a unit by specifying a list of files to
compile-file. Semi-inline expansion (see
section 5.8.2) provides another way
to extend block compilation across file boundaries.
| 5.7.5 |
Context
Declarations |
|
CMUCL has a context-sensitive declaration mechanism which is useful
because it allows flexible control of the compilation policy in
large systems without requiring changes to the source files. The
primary use of this feature is to allow the exported interfaces of
a system to be compiled more safely than the system internals. The
context used is the name being defined and the kind of definition
(function, macro, etc.)
The :context-declarations option to with-compilation-unit has
dynamic scope, affecting all compilation done during the evaluation
of the body. The argument to this option should evaluate to a list
of lists of the form:
(context-spec {declare-form}+)
In the indicated context, the specified declare forms are inserted
at the head of each definition. The declare forms for all contexts
that match are appended together, with earlier declarations getting
precedence over later ones. A simple example:
:context-declarations
'((:external (declare (optimize (safety 2)))))
This will cause all functions that are named by external symbols to
be compiled with safety 2.
The full syntax of context specs is:
- :internal, :external
- True if the symbol is internal (external) in its home
package.
- :uninterned
- True if the symbol has no home package.
- (:package {package-name}*)
- True if the symbol's home package is in any of the named
packages (false if uninterned.)
- :anonymous
- True if the function doesn't have any interesting name (not
defmacro, defun,
labels or flet).
- :macro, :function
- :macro is a global (defmacro) macro. :function is
anything else.
- :local, :global
- :local is a labels
or flet. :global is
anything else.
- (:or {context-spec}*)
- True when any supplied context-spec
is true.
- (:and {context-spec}*)
- True only when all supplied context-specs are true.
- (:not {context-spec}*)
- True when context-spec is false.
- (:member {name}*)
- True when the defined name is one of these names (equal test.)
- (:match {pattern}*)
- True when any of the patterns is a substring of the name. The
name is wrapped with $'s, so ``$FOO'' matches names beginning with ``FOO'', etc.
| 5.7.6 |
Context
Declaration Example |
|
Here is a more complex example of with-compilation-unit options:
:optimize '(optimize (speed 2) (space 2) (inhibit-warnings 2)
(debug 1) (safety 0))
:optimize-interface '(optimize-interface (safety 1) (debug 1))
:context-declarations
'(((:or :external (:and (:match "%") (:match "SET")))
(declare (optimize-interface (safety 2))))
((:or (:and :external :macro)
(:match "$PARSE-"))
(declare (optimize (safety 2)))))
The optimize and extensions:optimize-interface declarations (see
section 4.7.1) set up the global
compilation policy. The bodies of functions are to be compiled
completely unsafe (safety 0), but argument
count and weakened argument type checking is to be done when a
function is called (speed 2 safety 1).
The first declaration specifies that all functions that are
external or whose names contain both ``%''
and ``SET'' are to be compiled compiled with
completely safe interfaces (safety 2). The
reason for this particular :match rule is
that setf inverse functions in this system
tend to have both strings in their name somewhere. We want
setf inverses to be safe because they are
implicitly called by users even though their name is not
exported.
The second declaration makes external macros or functions whose
names start with ``PARSE-'' have safe bodies
(as well as interfaces). This is desirable because a syntax error
in a macro may cause a type error inside the body. The :match rule is used because macros often have auxiliary
functions whose names begin with this string.
This particular example is used to build part of the standard CMUCL
system. Note however, that context declarations must be set up
according to the needs and coding conventions of a particular
system; different parts of CMUCL are compiled with different
context declarations, and your system will probably need its own
declarations. In particular, any use of the :match option depends on naming conventions used in
coding.
Python can expand almost any function inline, including functions
with keyword arguments. The only restrictions are that keyword
argument keywords in the call must be constant, and that global
function definitions (defun) must be done in
a null lexical environment (not nested in a let or other binding form.) Local functions (flet) can be inline expanded in any environment.
Combined with Python's source-level optimization, inline expansion
can be used for things that formerly required macros for efficient
implementation. In Python, macros don't have any efficiency
advantage, so they need only be used where a macro's syntactic
flexibility is required.
Inline expansion is a compiler optimization technique that reduces
the overhead of a function call by simply not doing the call:
instead, the compiler effectively rewrites the program to appear as
though the definition of the called function was inserted at each
call site. In Common Lisp, this is straightforwardly expressed by
inserting the lambda corresponding to the
original definition:
(proclaim '(inline my-1+))
(defun my-1+ (x) (+ x 1))
(my-1+ someval) ==> ((lambda (x) (+ x 1)) someval)
When the function expanded inline is large, the program after
inline expansion may be substantially larger than the original
program. If the program becomes too large, inline expansion hurts
speed rather than helping it, since hardware resources such as
physical memory and cache will be exhausted. Inline expansion is
called for:
- When profiling has shown that a relatively simple function is
called so often that a large amount of time is being wasted in the
calling of that function (as opposed to running in that function.)
If a function is complex, it will take a long time to run relative
the time spent in call, so the speed advantage of inline expansion
is diminished at the same time the space cost of inline expansion
is increased. Of course, if a function is rarely called, then the
overhead of calling it is also insignificant.
- With functions so simple that they take less space to inline
expand than would be taken to call the function (such as my-1+ above.) It would require intimate knowledge of
the compiler to be certain when inline expansion would reduce
space, but it is generally safe to inline expand functions whose
definition is a single function call, or a few calls to simple
Common Lisp functions.
In addition to this speed/space tradeoff from inline expansion's
avoidance of the call, inline expansion can also reveal
opportunities for optimization. Python's extensive source-level
optimization can make use of context information from the caller to
tremendously simplify the code resulting from the inline expansion
of a function.
The main form of caller context is local information about the
actual argument values: what the argument types are and whether the
arguments are constant. Knowledge about argument types can
eliminate run-time type tests (e.g., for generic arithmetic.)
Constant arguments in a call provide opportunities for constant
folding optimization after inline expansion.
A hidden way that constant arguments are often supplied to
functions is through the defaulting of unsupplied optional or
keyword arguments. There can be a huge efficiency advantage to
inline expanding functions that have complex keyword-based
interfaces, such as this definition of the member function:
(proclaim '(inline member))
(defun member (item list &key
(key #'identity)
(test #'eql testp)
(test-not nil notp))
(do ((list list (cdr list)))
((null list) nil)
(let ((car (car list)))
(if (cond (testp
(funcall test item (funcall key car)))
(notp
(not (funcall test-not item (funcall key car))))
(t
(funcall test item (funcall key car))))
(return list)))))
After inline expansion, this call is simplified to the obvious
code:
(member a l :key #'foo-a :test #'char=) ==>
(do ((list list (cdr list)))
((null list) nil)
(let ((car (car list)))
(if (char= item (foo-a car))
(return list))))
In this example, there could easily be more than an order of
magnitude improvement in speed. In addition to eliminating the
original call to member, inline expansion
also allows the calls to char= and foo-a to be open-coded. We go from a loop with three
tests and two calls to a loop with one test and no calls.
See section 5.4 for more
discussion of source level optimization.
| 5.8.1 |
Inline Expansion
Recording |
|
Inline expansion requires that the source for the inline expanded
function to be available when calls to the function are compiled.
The compiler doesn't remember the inline expansion for every
function, since that would take an excessive about of space.
Instead, the programmer must tell the compiler to record the inline
expansion before the definition of the inline expanded function is
compiled. This is done by globally declaring the function inline
before the function is defined, by using the inline and extensions:maybe-inline (see section 5.8.3) declarations.
In addition to recording the inline expansion of inline functions
at the time the function is compiled, compile-file also puts the inline expansion in the
output file. When the output file is loaded, the inline expansion
is made available for subsequent compilations; there is no need to
compile the definition again to record the inline expansion.
If a function is declared inline, but no expansion is recorded,
then the compiler will give an efficiency note like:
Note: MYFUN is declared inline, but has no expansion.
When you get this note, check that the inline
declaration and the definition appear before the calls that are to
be inline expanded. This note will also be given if the inline
expansion for a defun could not be recorded
because the defun was in a non-null lexical
environment.
| 5.8.2 |
Semi-Inline
Expansion |
|
Python supports semi-inline functions.
Semi-inline expansion shares a single copy of a function across all
the calls in a component by converting the inline expansion into a
local function (see section 5.6.)
This takes up less space when there are multiple calls, but also
provides less opportunity for context dependent optimization. When
there is only one call, the result is identical to normal inline
expansion. Semi-inline expansion is done when the space optimization quality is 0,
and the function has been declared extensions:maybe-inline.
This mechanism of inline expansion combined with local call also
allows recursive functions to be inline expanded. If a recursive
function is declared inline, calls will
actually be compiled semi-inline. Although recursive functions are
often so complex that there is little advantage to semi-inline
expansion, it can still be useful in the same sort of cases where
normal inline expansion is especially advantageous, i.e. functions
where the calling context can help a lot.
| 5.8.3 |
The Maybe-Inline
Declaration |
|
The extensions:maybe-inline declaration is a
CMUCL extension. It is similar to inline, but
indicates that inline expansion may sometimes be desirable, rather
than saying that inline expansion should almost always be done.
When used in a global declaration, extensions:maybe-inline causes the expansion for the
named functions to be recorded, but the functions aren't actually
inline expanded unless space is 0 or the function is eventually (perhaps locally)
declared inline.
Use of the extensions:maybe-inline
declaration followed by the defun is
preferable to the standard idiom of:
(proclaim '(inline myfun))
(defun myfun () ...)
(proclaim '(notinline myfun))
;;; Any calls to myfun here are not inline expanded.
(defun somefun ()
(declare (inline myfun))
;;
;; Calls to myfun here are inline expanded.
...)
The problem with using notinline in this way
is that in Common Lisp it does more