Previous Up Next
Chapter 5 Advanced Compiler Use and Efficiency Hints

by Robert MacLachlan

5.1 Advanced Compiler Introduction

In CMUCL, as with any language on any computer, the path to efficient code starts with good algorithms and sensible programming techniques, but to avoid inefficiency pitfalls, you need to know some of this implementation's quirks and features. This chapter is mostly a fairly long and detailed overview of what optimizations Python does. Although there are the usual negative suggestions of inefficient features to avoid, the main emphasis is on describing the things that programmers can count on being efficient.

The optimizations described here can have the effect of speeding up existing programs written in conventional styles, but the potential for new programming styles that are clearer and less error-prone is at least as significant. For this reason, several sections end with a discussion of the implications of these optimizations for programming style.

5.1.1 Types

Python's support for types is unusual in three major ways:
5.1.2 Optimization

The main barrier to efficient Lisp programs is not that there is no efficient way to code the program in Lisp, but that it is difficult to arrive at that efficient coding. Common Lisp is a highly complex language, and usually has many semantically equivalent ``reasonable'' ways to code a given problem. It is desirable to make all of these equivalent solutions have comparable efficiency so that programmers don't have to waste time discovering the most efficient solution.

Source level optimization increases the number of efficient ways to solve a problem. This effect is much larger than the increase in the efficiency of the ``best'' solution. Source level optimization transforms the original program into a more efficient (but equivalent) program. Although the optimizer isn't doing anything the programmer couldn't have done, this high-level optimization is important because: Source level optimization eliminates the need for macros to optimize their expansion, and also increases the effectiveness of inline expansion. See sections 5.4 and 5.8.

Efficient support for a safer programming style is the biggest advantage of source level optimization. Existing tuned programs typically won't benefit much from source optimization, since their source has already been optimized by hand. However, even tuned programs tend to run faster under Python because:
5.1.3 Function Call

The sort of symbolic programs generally written in Common Lisp often favor recursion over iteration, or have inner loops so complex that they involve multiple function calls. Such programs spend a larger fraction of their time doing function calls than is the norm in other languages; for this reason Common Lisp implementations strive to make the general (or full) function call as inexpensive as possible. Python goes beyond this by providing two good alternatives to full call: Generally, Python provides simple implementations for simple uses of function call, rather than having only a single calling convention. These features allow a more natural programming style:
5.1.4 Representation of Objects

Sometimes traditional Common Lisp implementation techniques compare so poorly to the techniques used in other languages that Common Lisp can become an impractical language choice. Terrible inefficiencies appear in number-crunching programs, since Common Lisp numeric operations often involve number-consing and generic arithmetic. Python supports efficient natural representations for numbers (and some other types), and allows these efficient representations to be used in more contexts. Python also provides good efficiency notes that warn when a crucial declaration is missing.

See section 5.11.2 for more about object representations and numeric types. Also see section 5.13 about efficiency notes.

5.1.5 Writing Efficient Code

Writing efficient code that works is a complex and prolonged process. It is important not to get so involved in the pursuit of efficiency that you lose sight of what the original problem demands. Remember that: The best way to get efficient code that is still worth using, is to separate coding from tuning. During coding, you should: During tuning, you should:
5.2 More About Types in Python

This section goes into more detail describing what types and declarations are recognized by Python. The area where Python differs most radically from previous Common Lisp compilers is in its support for types:
5.2.1 More Types Meaningful

Common Lisp has a very powerful type system, but conventional Common Lisp implementations typically only recognize the small set of types special in that implementation. In these systems, there is an unfortunate paradox: a declaration for a relatively general type like fixnum will be recognized by the compiler, but a highly specific declaration such as (integer 3 17) is totally ignored.

This is obviously a problem, since the user has to know how to specify the type of an object in the way the compiler wants it. A very minimal (but rarely satisfied) criterion for type system support is that it be no worse to make a specific declaration than to make a general one. Python goes beyond this by exploiting a number of advantages obtained from detailed type information.

Using more restrictive types in declarations allows the compiler to do better type inference and more compile-time type checking. Also, when type declarations are considered to be consistency assertions that should be verified (conditional on policy), then complex types are useful for making more detailed assertions.

Python ``understands'' the list-style or, member, function, array and number type specifiers. Understanding means that: For related information, see section 5.11 for numeric types, and section 5.10.3 for array types.

5.2.2 Canonicalization

When given a type specifier, Python will often rewrite it into a different (but equivalent) type. This is the mechanism that Python uses for detecting type equivalence. For example, in Python's canonical representation, these types are equivalent:
(or list (member :end)) <==> (or cons (member nil :end))
This has two implications for the user:
5.2.3 Member Types

The member type specifier can be used to represent ``symbolic'' values, analogous to the enumerated types of Pascal. For example, the second value of find-symbol has this type:
(member :internal :external :inherited nil)
Member types are very useful for expressing consistency constraints on data structures, for example:
(defstruct ice-cream
  (flavor :vanilla :type (member :vanilla :chocolate :strawberry)))
Member types are also useful in type inference, as the number of members can sometimes be pared down to one, in which case the value is a known constant.

5.2.4 Union Types

The or (union) type specifier is understood, and is meaningfully applied in many contexts. The use of or allows assertions to be made about types in dynamically typed programs. For example:
(defstruct box
  (next nil :type (or box null))
  (top :removed :type (or box-top (member :removed))))
The type assertion on the top slot ensures that an error will be signaled when there is an attempt to store an illegal value (such as :rmoved.) Although somewhat weak, these union type assertions provide a useful input into type inference, allowing the cost of type checking to be reduced. For example, this loop is safely compiled with no type checks:
(defun find-box-with-top (box)
  (declare (type (or box null) box))
  (do ((current box (box-next current)))
      ((null current))
    (unless (eq (box-top current) :removed)
      (return current))))
Union types are also useful in type inference for representing types that are partially constrained. For example, the result of this expression:
(if foo
    (logior x y)
    (list x y))
can be expressed as (or integer cons).

5.2.5 The Empty Type

The type nil is also called the empty type, since no object is of type nil. The union of no types, (or), is also empty. Python's interpretation of an expression whose type is nil is that the expression never yields any value, but rather fails to terminate, or is thrown out of. For example, the type of a call to error or a use of return is nil. When the type of an expression is empty, compile-time type warnings about its value are suppressed; presumably somebody else is signaling an error. If a function is declared to have return type nil, but does in fact return, then (in safe compilation policies) a ``NIL Function returned'' error will be signaled. See also the function required-argument.

5.2.6 Function Types

function types are understood in the restrictive sense, specifying: This is consistent with the new interpretation of function types and the ftype declaration in the proposed X3J13 ``function-type-argument-type-semantics'' cleanup. Note also, that if you don't explicitly declare the type of a function using a global ftype declaration, then Python will compute a function type from the definition, providing a degree of inter-routine type inference, see section 5.3.3.

5.2.7 The Values Declaration

CMUCL supports the values declaration as an extension to Common Lisp. The syntax of the declaration is (values type1 type2...typen). This declaration is semantically equivalent to a the form wrapped around the body of the special form in which the values declaration appears. The advantage of values over the is purely syntactic---it doesn't introduce more indentation. For example:
(defun foo (x)
  (declare (values single-float))
  (ecase x
    (:this ...)
    (:that ...)
    (:the-other ...)))
is equivalent to:
(defun foo (x)
  (the single-float
       (ecase x
         (:this ...)
         (:that ...)
         (:the-other ...))))
and
(defun floor (number &optional (divisor 1))
  (declare (values integer real))
  ...)
is equivalent to:
(defun floor (number &optional (divisor 1))
  (the (values integer real)
       ...))
In addition to being recognized by lambda (and hence by defun), the values declaration is recognized by all the other special forms with bodies and declarations: let, let*, labels and flet. Macros with declarations usually splice the declarations into one of the above forms, so they will accept this declaration too, but the exact effect of a values declaration will depend on the macro.

If you declare the types of all arguments to a function, and also declare the return value types with values, you have described the type of the function. Python will use this argument and result type information to derive a function type that will then be applied to calls of the function (see section 5.2.6.) This provides a way to declare the types of functions that is much less syntactically awkward than using the ftype declaration with a function type specifier.

Although the values declaration is non-standard, it is relatively harmless to use it in otherwise portable code, since any warning in non-CMU implementations can be suppressed with the standard declaration proclamation.

5.2.8 Structure Types

Because of precise type checking, structure types are much better supported by Python than by conventional compilers: This error checking is tremendously useful for detecting bugs in programs that manipulate complex data structures.

An additional advantage of checking structure types and enforcing slot types is that the compiler can safely believe slot type declarations. Python effectively moves the type checking from the slot access to the slot setter or constructor call. This is more efficient since caller of the setter or constructor often knows the type of the value, entirely eliminating the need to check the value's type. Consider this example:
(defstruct coordinate
  (x nil :type single-float)
  (y nil :type single-float))

(defun make-it ()
  (make-coordinate :x 1.0 :y 1.0))

(defun use-it (it)
  (declare (type coordinate it))
  (sqrt (expt (coordinate-x it) 2) (expt (coordinate-y it) 2)))


make-it and use-it are compiled with no checking on the types of the float slots, yet use-it can use single-float arithmetic with perfect safety. Note that make-coordinate must still check the values of x and y unless the call is block compiled or inline expanded (see section 5.6.) But even without this advantage, it is almost always more efficient to check slot values on structure initialization, since slots are usually written once and read many times.

5.2.9 The Freeze-Type Declaration

The extensions:freeze-type declaration is a CMUCL extension that enables more efficient compilation of user-defined types by asserting that the definition is not going to change. This declaration may only be used globally (with declaim or proclaim). Currently freeze-type only affects structure type testing done by typep, typecase, etc. Here is an example:
(declaim (freeze-type foo bar))
This asserts that the types foo and bar and their subtypes are not going to change. This allows more efficient type testing, since the compiler can open-code a test for all possible subtypes, rather than having to examine the type hierarchy at run-time.

5.2.10 Type Restrictions

Avoid use of the and, not and satisfies types in declarations, since type inference has problems with them. When these types do appear in a declaration, they are still checked precisely, but the type information is of limited use to the compiler. and types are effective as long as the intersection can be canonicalized to a type that doesn't use and. For example:
(and fixnum unsigned-byte)
is fine, since it is the same as:
(integer 0 most-positive-fixnum)
but this type:
(and symbol (not (member :end)))
will not be fully understood by type interference since the and can't be removed by canonicalization.

Using any of these type specifiers in a type test with typep or typecase is fine, since as tests, these types can be translated into the and macro, the not function or a call to the satisfies predicate.

5.2.11 Type Style Recommendations

Python provides good support for some currently unconventional ways of using the Common Lisp type system. With Python, it is desirable to make declarations as precise as possible, but type inference also makes some declarations unnecessary. Here are some general guidelines for maximum robustness and efficiency:
5.3 Type Inference

Type inference is the process by which the compiler tries to figure out the types of expressions and variables, given an inevitable lack of complete type information. Although Python does much more type inference than most Common Lisp compilers, remember that the more precise and comprehensive type declarations are, the more type inference will be able to do.

5.3.1 Variable Type Inference

The type of a variable is the union of the types of all the definitions. In the degenerate case of a let, the type of the variable is the type of the initial value. This inferred type is intersected with any declared type, and is then propagated to all the variable's references. The types of multiple-value-bind variables are similarly inferred from the types of the individual values of the values form.

If multiple type declarations apply to a single variable, then all the declarations must be correct; it is as though all the types were intersected producing a single and type specifier. In this example:
(defmacro my-dotimes ((var count) &body body)
  `(do ((,var 0 (1+ ,var)))
       ((>= ,var ,count))
     (declare (type (integer 0 *) ,var))
     ,@body))

(my-dotimes (i ...)
  (declare (fixnum i))
  ...)
the two declarations for i are intersected, so i is known to be a non-negative fixnum.

In practice, this type inference is limited to lets and local functions, since the compiler can't analyze all the calls to a global function. But type inference works well enough on local variables so that it is often unnecessary to declare the type of local variables. This is especially likely when function result types and structure slot types are declared. The main areas where type inference breaks down are:
5.3.2 Local Function Type Inference

The types of arguments to local functions are inferred in the same was as any other local variable; the type is the union of the argument types across all the calls to the function, intersected with the declared type. If there are any assignments to the argument variables, the type of the assigned value is unioned in as well.

The result type of a local function is computed in a special way that takes tail recursion (see section 5.5) into consideration. The result type is the union of all possible return values that aren't tail-recursive calls. For example, Python will infer that the result type of this function is integer:
(defun ! (n res)
  (declare (integer n res))
  (if (zerop n)
      res
      (! (1- n) (* n res))))
Although this is a rather obvious result, it becomes somewhat less trivial in the presence of mutual tail recursion of multiple functions. Local function result type inference interacts with the mechanisms for ensuring proper tail recursion mentioned in section 5.6.5.

5.3.3 Global Function Type Inference

As described in section 5.2.6, a global function type (ftype) declaration places implicit type assertions on the call arguments, and also guarantees the type of the return value. So wherever a call to a declared function appears, there is no doubt as to the types of the arguments and return value. Furthermore, Python will infer a function type from the function's definition if there is no ftype declaration. Any type declarations on the argument variables are used as the argument types in the derived function type, and the compiler's best guess for the result type of the function is used as the result type in the derived function type.

This method of deriving function types from the definition implicitly assumes that functions won't be redefined at run-time. Consider this example:
(defun foo-p (x)
  (let ((res (and (consp x) (eq (car x) 'foo))))
    (format t "It is ~:[not ~;~]foo." res)))

(defun frob (it)
  (if (foo-p it)
      (setf (cadr it) 'yow!)
      (1+ it)))
Presumably, the programmer really meant to return res from foo-p, but he seems to have forgotten. When he tries to call do (frob (list 'foo nil)), frob will flame out when it tries to add to a cons. Realizing his error, he fixes foo-p and recompiles it. But when he retries his test case, he is baffled because the error is still there. What happened in this example is that Python proved that the result of foo-p is null, and then proceeded to optimize away the setf in frob.

Fortunately, in this example, the error is detected at compile time due to notes about unreachable code (see section 5.4.5.) Still, some users may not want to worry about this sort of problem during incremental development, so there is a variable to control deriving function types.


[Variable]
extensions:*derive-function-types*    

If true (the default), argument and result type information derived from compilation of defuns is used when compiling calls to that function. If false, only information from ftype proclamations will be used.
5.3.4 Operation Specific Type Inference

Many of the standard Common Lisp functions have special type inference procedures that determine the result type as a function of the argument types. For example, the result type of aref is the array element type. Here are some other examples of type inferences:
(logand x #xFF) ==> (unsigned-byte 8)

(+ (the (integer 0 12) x) (the (integer 0 1) y)) ==> (integer 0 13)

(ash (the (unsigned-byte 16) x) -8) ==> (unsigned-byte 8)
5.3.5 Dynamic Type Inference

Python uses flow analysis to infer types in dynamically typed programs. For example:
(ecase x
  (list (length x))
  ...)
Here, the compiler knows the argument to length is a list, because the call to length is only done when x is a list. The most significant efficiency effect of inference from assertions is usually in type check optimization.

Dynamic type inference has two inputs: explicit conditionals and implicit or explicit type assertions. Flow analysis propagates these constraints on variable type to any code that can be executed only after passing though the constraint. Explicit type constraints come from ifs where the test is either a lexical variable or a function of lexical variables and constants, where the function is either a type predicate, a numeric comparison or eq.

If there is an eq (or eql) test, then the compiler will actually substitute one argument for the other in the true branch. For example:
(when (eq x :yow!) (return x))
becomes:
(when (eq x :yow!) (return :yow!))
This substitution is done when one argument is a constant, or one argument has better type information than the other. This transformation reveals opportunities for constant folding or type-specific optimizations. If the test is against a constant, then the compiler can prove that the variable is not that constant value in the false branch, or (not (member :yow!)) in the example above. This can eliminate redundant tests, for example:
(if (eq x nil)
    ...
    (if x a b))
is transformed to this:
(if (eq x nil)
    ...
    a)
Variables appearing as if tests are interpreted as (not (eq var nil)) tests. The compiler also converts = into eql where possible. It is difficult to do inference directly on = since it does implicit coercions.

When there is an explicit or test on numeric variables, the compiler makes inferences about the ranges the variables can assume in the true and false branches. This is mainly useful when it proves that the values are small enough in magnitude to allow open-coding of arithmetic operations. For example, in many uses of dotimes with a fixnum repeat count, the compiler proves that fixnum arithmetic can be used.

Implicit type assertions are quite common, especially if you declare function argument types. Dynamic inference from implicit type assertions sometimes helps to disambiguate programs to a useful degree, but is most noticeable when it detects a dynamic type error. For example:
(defun foo (x)
  (+ (car x) x))
results in this warning:
In: DEFUN FOO
  (+ (CAR X) X)
==>
  X
Warning: Result is a LIST, not a NUMBER.
Note that Common Lisp's dynamic type checking semantics make dynamic type inference useful even in programs that aren't really dynamically typed, for example:
(+ (car x) (length x))
Here, x presumably always holds a list, but in the absence of a declaration the compiler cannot assume x is a list simply because list-specific operations are sometimes done on it. The compiler must consider the program to be dynamically typed until it proves otherwise. Dynamic type inference proves that the argument to length is always a list because the call to length is only done after the list-specific car operation.

5.3.6 Type Check Optimization

Python backs up its support for precise type checking by minimizing the cost of run-time type checking. This is done both through type inference and though optimizations of type checking itself.

Type inference often allows the compiler to prove that a value is of the correct type, and thus no type check is necessary. For example:
(defstruct foo a b c)
(defstruct link
  (foo (required-argument) :type foo)
  (next nil :type (or link null)))

(foo-a (link-foo x))
Here, there is no need to check that the result of link-foo is a foo, since it always is. Even when some type checks are necessary, type inference can often reduce the number:
(defun test (x)
  (let ((a (foo-a x))
        (b (foo-b x))
        (c (foo-c x)))
    ...))
In this example, only one (foo-p x) check is needed. This applies to a lesser degree in list operations, such as:
(if (eql (car x) 3) (cdr x) y)
Here, we only have to check that x is a list once.

Since Python recognizes explicit type tests, code that explicitly protects itself against type errors has little introduced overhead due to implicit type checking. For example, this loop compiles with no implicit checks checks for car and cdr:
(defun memq (e l)
  (do ((current l (cdr current)))
      ((atom current) nil)
    (when (eq (car current) e) (return current))))
Python reduces the cost of checks that must be done through an optimization called complementing. A complemented check for type is simply a check that the value is not of the type (not type). This is only interesting when something is known about the actual type, in which case we can test for the complement of (and known-type (not type)), or the difference between the known type and the assertion. An example:
(link-foo (link-next x))
Here, we change the type check for link-foo from a test for foo to a test for:
(not (and (or foo null) (not foo)))
or more simply (not null). This is probably the most important use of complementing, since the situation is fairly common, and a null test is much cheaper than a structure type test.

Here is a more complicated example that illustrates the combination of complementing with dynamic type inference:
(defun find-a (a x)
  (declare (type (or link null) x))
  (do ((current x (link-next current)))
      ((null current) nil)
    (let ((foo (link-foo current)))
      (when (eq (foo-a foo) a) (return foo)))))
This loop can be compiled with no type checks. The link test for link-foo and link-next is complemented to (not null), and then deleted because of the explicit null test. As before, no check is necessary for foo-a, since the link-foo is always a foo. This sort of situation shows how precise type checking combined with precise declarations can actually result in reduced type checking.

5.4 Source Optimization

This section describes source-level transformations that Python does on programs in an attempt to make them more efficient. Although source-level optimizations can make existing programs more efficient, the biggest advantage of this sort of optimization is that it makes it easier to write efficient programs. If a clean, straightforward implementation is can be transformed into an efficient one, then there is no need for tricky and dangerous hand optimization.

5.4.1 Let Optimization


The primary optimization of let variables is to delete them when they are unnecessary. Whenever the value of a let variable is a constant, a constant variable or a constant (local or non-notinline) function, the variable is deleted, and references to the variable are replaced with references to the constant expression. This is useful primarily in the expansion of macros or inline functions, where argument values are often constant in any given call, but are in general non-constant expressions that must be bound to preserve order of evaluation. Let variable optimization eliminates the need for macros to carefully avoid spurious bindings, and also makes inline functions just as efficient as macros.

A particularly interesting class of constant is a local function. Substituting for lexical variables that are bound to a function can substantially improve the efficiency of functional programming styles, for example:
(let ((a #'(lambda (x) (zow x))))
  (funcall a 3))
effectively transforms to:
(zow 3)
This transformation is done even when the function is a closure, as in:
(let ((a (let ((y (zug)))
           #'(lambda (x) (zow x y)))))
  (funcall a 3))
becoming:
(zow 3 (zug))
A constant variable is a lexical variable that is never assigned to, always keeping its initial value. Whenever possible, avoid setting lexical variables---instead bind a new variable to the new value. Except for loop variables, it is almost always possible to avoid setting lexical variables. This form:
(let ((x (f x)))
  ...)
is more efficient than this form:
(setq x (f x))
...
Setting variables makes the program more difficult to understand, both to the compiler and to the programmer. Python compiles assignments at least as efficiently as any other Common Lisp compiler, but most let optimizations are only done on constant variables.

Constant variables with only a single use are also optimized away, even when the initial value is not constant.1 For example, this expansion of incf:
(let ((#:g3 (+ x 1)))
  (setq x #:G3))
becomes:
(setq x (+ x 1))
The type semantics of this transformation are more important than the elimination of the variable itself. Consider what happens when x is declared to be a fixnum; after the transformation, the compiler can compile the addition knowing that the result is a fixnum, whereas before the transformation the addition would have to allow for fixnum overflow.

Another variable optimization deletes any variable that is never read. This causes the initial value and any assigned values to be unused, allowing those expressions to be deleted if they have no side-effects.

Note that a let is actually a degenerate case of local call (see section 5.6.2), and that let optimization can be done on calls that weren't created by a let. Also, local call allows an applicative style of iteration that is totally assignment free.

5.4.2 Constant Folding

Constant folding is an optimization that replaces a call of constant arguments with the constant result of that call. Constant folding is done on all standard functions for which it is legal. Inline expansion allows folding of any constant parts of the definition, and can be done even on functions that have side-effects.

It is convenient to rely on constant folding when programming, as in this example:
(defconstant limit 42)

(defun foo ()
  (... (1- limit) ...))
Constant folding is also helpful when writing macros or inline functions, since it usually eliminates the need to write a macro that special-cases constant arguments.

Constant folding of a user defined function is enabled by the extensions:constant-function proclamation. In this example:
(declaim (ext:constant-function myfun))
(defun myexp (x y)
  (declare (single-float x y))
  (exp (* (log x) y)))

 ... (myexp 3.0 1.3) ...
The call to myexp is constant-folded to 4.1711674.

5.4.3 Unused Expression Elimination

If the value of any expression is not used, and the expression has no side-effects, then it is deleted. As with constant folding, this optimization applies most often when cleaning up after inline expansion and other optimizations. Any function declared an extensions:constant-function is also subject to unused expression elimination.

Note that Python will eliminate parts of unused expressions known to be side-effect free, even if there are other unknown parts. For example:
(let ((a (list (foo) (bar))))
  (if t
      (zow)
      (raz a)))
becomes:
(progn (foo) (bar))
(zow)
5.4.4 Control Optimization

The most important optimization of control is recognizing when an if test is known at compile time, then deleting the if, the test expression, and the unreachable branch of the if. This can be considered a special case of constant folding, although the test doesn't have to be truly constant as long as it is definitely not nil. Note also, that type inference propagates the result of an if test to the true and false branches, see section 5.3.5.

A related if optimization is this transformation:2
(if (if a b c) x y)
into:
(if a
    (if b x y)
    (if c x y))
The opportunity for this sort of optimization usually results from a conditional macro. For example:
(if (not a) x y)
is actually implemented as this:
(if (if a nil t) x y)
which is transformed to this:
(if a
    (if nil x y)
    (if t x y))
which is then optimized to this:
(if a y x)
Note that due to Python's internal representations, the if---if situation will be recognized even if other forms are wrapped around the inner if, like:
(if (let ((g ...))
      (loop
        ...
        (return (not g))
        ...))
    x y)
In Python, all the Common Lisp macros really are macros, written in terms of if, block and tagbody, so user-defined control macros can be just as efficient as the standard ones. Python emits basic blocks using a heuristic that minimizes the number of unconditional branches. The code in a tagbody will not be emitted in the order it appeared in the source, so there is no point in arranging the code to make control drop through to the target.

5.4.5 Unreachable Code Deletion

Python will delete code whenever it can prove that the code can never be executed. Code becomes unreachable when: When code that appeared in the original source is deleted, the compiler prints a note to indicate a possible problem (or at least unnecessary code.) For example:
(defun foo ()
  (if t
      (write-line "True.")
      (write-line "False.")))
will result in this note:
In: DEFUN FOO
  (WRITE-LINE "False.")
Note: Deleting unreachable code.
It is important to pay attention to unreachable code notes, since they often indicate a subtle type error. For example:
(defstruct foo a b)

(defun lose (x)
  (let ((a (foo-a x))
        (b (if x (foo-b x) :none)))
    ...))
results in this note:
In: DEFUN LOSE
  (IF X (FOO-B X) :NONE)
==>
  :NONE
Note: Deleting unreachable code.
The :none is unreachable, because type inference knows that the argument to foo-a must be a foo, and thus can't be nil. Presumably the programmer forgot that x could be nil when he wrote the binding for a.

Here is an example with an incorrect declaration:
(defun count-a (string)
  (do ((pos 0 (position #\a string :start (1+ pos)))
       (count 0 (1+ count)))
      ((null pos) count)
    (declare (fixnum pos))))
This time our note is:
In: DEFUN COUNT-A
  (DO ((POS 0 #) (COUNT 0 #))
      ((NULL POS) COUNT)
    (DECLARE (FIXNUM POS)))
--> BLOCK LET TAGBODY RETURN-FROM PROGN 
==>
  COUNT
Note: Deleting unreachable code.
The problem here is that pos can never be null since it is declared a fixnum.

It takes some experience with unreachable code notes to be able to tell what they are trying to say. In non-obvious cases, the best thing to do is to call the function in a way that should cause the unreachable code to be executed. Either you will get a type error, or you will find that there truly is no way for the code to be executed.

Not all unreachable code results in a note: Somewhat spurious unreachable code notes can also result when a macro inserts multiple copies of its arguments in different contexts, for example:
(defmacro t-and-f (var form)
  `(if ,var ,form ,form))

(defun foo (x)
  (t-and-f x (if x "True." "False.")))
results in these notes:
In: DEFUN FOO
  (IF X "True." "False.")
==>
  "False."
Note: Deleting unreachable code.

==>
  "True."
Note: Deleting unreachable code.
It seems like it has deleted both branches of the if, but it has really deleted one branch in one copy, and the other branch in the other copy. Note that these messages are only spurious in not satisfying the intent of the rule that notes are only given when the deleted code appears in the original source; there is always some code being deleted when a unreachable code note is printed.

5.4.6 Multiple Values Optimization

Within a function, Python implements uses of multiple values particularly efficiently. Multiple values can be kept in arbitrary registers, so using multiple values doesn't imply stack manipulation and representation conversion. For example, this code:
(let ((a (if x (foo x) u))
      (b (if x (bar x) v)))
  ...)
is actually more efficient written this way:
(multiple-value-bind
    (a b)
    (if x
        (values (foo x) (bar x))
        (values u v))
  ...)
Also, see section 5.6.5 for information on how local call provides efficient support for multiple function return values.

5.4.7 Source to Source Transformation

The compiler implements a number of operation-specific optimizations as source-to-source transformations. You will often see unfamiliar code in error messages, for example:
(defun my-zerop () (zerop x))
gives this warning:
In: DEFUN MY-ZEROP
  (ZEROP X)
==>
  (= X 0)
Warning: Undefined variable: X
The original zerop has been transformed into a call to =. This transformation is indicated with the same ==> used to mark macro and function inline expansion. Although it can be confusing, display of the transformed source is important, since warnings are given with respect to the transformed source. This a more obscure example:
(defun foo (x) (logand 1 x))
gives this efficiency note:
In: DEFUN FOO
  (LOGAND 1 X)
==>
  (LOGAND C::Y C::X)
Note: Forced to do static-function Two-arg-and (cost 53).
      Unable to do inline fixnum arithmetic (cost 1) because:
      The first argument is a INTEGER, not a FIXNUM.
      etc.
Here, the compiler commuted the call to logand, introducing temporaries. The note complains that the first argument is not a fixnum, when in the original call, it was the second argument. To make things more confusing, the compiler introduced temporaries called c::x and c::y that are bound to y and 1, respectively.

You will also notice source-to-source optimizations when efficiency notes are enabled (see section 5.13.) When the compiler is unable to do a transformation that might be possible if there was more information, then an efficiency note is printed. For example, my-zerop above will also give this efficiency note:
In: DEFUN FOO
  (ZEROP X)
==>
  (= X 0)
Note: Unable to optimize because:
      Operands might not be the same type, so can't open code.
5.4.8 Style Recommendations

Source level optimization makes possible a clearer and more relaxed programming style:
5.5 Tail Recursion

A call is tail-recursive if nothing has to be done after the the call returns, i.e. when the call returns, the returned value is immediately returned from the calling function. In this example, the recursive call to myfun is tail-recursive:
(defun myfun (x)
  (if (oddp (random x))
      (isqrt x)
      (myfun (1- x))))
Tail recursion is interesting because it is form of recursion that can be implemented much more efficiently than general recursion. In general, a recursive call requires the compiler to allocate storage on the stack at run-time for every call that has not yet returned. This memory consumption makes recursion unacceptably inefficient for representing repetitive algorithms having large or unbounded size. Tail recursion is the special case of recursion that is semantically equivalent to the iteration constructs normally used to represent repetition in programs. Because tail recursion is equivalent to iteration, tail-recursive programs can be compiled as efficiently as iterative programs.

So why would you want to write a program recursively when you can write it using a loop? Well, the main answer is that recursion is a more general mechanism, so it can express some solutions simply that are awkward to write as a loop. Some programmers also feel that recursion is a stylistically preferable way to write loops because it avoids assigning variables. For example, instead of writing:
(defun fun1 (x)
  something-that-uses-x)

(defun fun2 (y)
  something-that-uses-y)

(do ((x something (fun2 (fun1 x))))
    (nil))
You can write:
(defun fun1 (x)
  (fun2 something-that-uses-x))

(defun fun2 (y)
  (fun1 something-that-uses-y))

(fun1 something)
The tail-recursive definition is actually more efficient, in addition to being (arguably) clearer. As the number of functions and the complexity of their call graph increases, the simplicity of using recursion becomes compelling. Consider the advantages of writing a large finite-state machine with separate tail-recursive functions instead of using a single huge prog.

It helps to understand how to use tail recursion if you think of a tail-recursive call as a psetq that assigns the argument values to the called function's variables, followed by a go to the start of the called function. This makes clear an inherent efficiency advantage of tail-recursive call: in addition to not having to allocate a stack frame, there is no need to prepare for the call to return (e.g., by computing a return PC.)

Is there any disadvantage to tail recursion? Other than an increase in efficiency, the only way you can tell that a call has been compiled tail-recursively is if you use the debugger. Since a tail-recursive call has no stack frame, there is no way the debugger can print out the stack frame representing the call. The effect is that backtrace will not show some calls that would have been displayed in a non-tail-recursive implementation. In practice, this is not as bad as it sounds---in fact it isn't really clearly worse, just different. See section 3.3.5 for information about the debugger implications of tail recursion, and how to turn it off for the sake of more conservative backtrace information.

In order to ensure that tail-recursion is preserved in arbitrarily complex calling patterns across separately compiled functions, the compiler must compile any call in a tail-recursive position as a tail-recursive call. This is done regardless of whether the program actually exhibits any sort of recursive calling pattern. In this example, the call to fun2 will always be compiled as a tail-recursive call:
(defun fun1 (x)
  (fun2 x))
So tail recursion doesn't necessarily have anything to do with recursion as it is normally thought of. See section 5.6.4 for more discussion of using tail recursion to implement loops.

5.5.1 Tail Recursion Exceptions

Although Python is claimed to be ``properly'' tail-recursive, some might dispute this, since there are situations where tail recursion is inhibited: These dynamic extent binding forms inhibit tail recursion because they allocate stack space to represent the binding. Shallow-binding implementations of dynamic scoping also require cleanup code to be evaluated when the scope is exited.

In addition, optimization of tail-recursive calls is inhibited when the debug optimization quality is greater than 2 (see section 3.6.)

5.6 Local Call

Python supports two kinds of function call: full call and local call. Full call is the standard calling convention; its late binding and generality make Common Lisp what it is, but create unavoidable overheads. When the compiler can compile the calling function and the called function simultaneously, it can use local call to avoid some of the overhead of full call. Local call is really a collection of compilation strategies. If some aspect of call overhead is not needed in a particular local call, then it can be omitted. In some cases, local call can be totally free. Local call provides two main advantages to the user:
5.6.1 Self-Recursive Calls

Local call is used when a function defined by defun calls itself. For example:
(defun fact (n)
  (if (zerop n)
      1
      (* n (fact (1- n)))))
This use of local call speeds recursion, but can also complicate debugging, since trace will only show the first call to fact, and not the recursive calls. This is because the recursive calls directly jump to the start of the function, and don't indirect through the symbol-function. Self-recursive local call is inhibited when the :block-compile argument to compile-file is nil (see section 5.7.3.)

5.6.2 Let Calls
Because local call avoids unnecessary call overheads, the compiler internally uses local call to implement some macros and special forms that are not normally thought of as involving a function call. For example, this let:
(let ((a (foo))
      (b (bar)))
  ...)
is internally represented as though it was macroexpanded into:
(funcall #'(lambda (a b)
             ...)
         (foo)
         (bar))
This implementation is acceptable because the simple cases of local call (equivalent to a let) result in good code. This doesn't make let any more efficient, but does make local calls that are semantically the same as let much more efficient than full calls. For example, these definitions are all the same as far as the compiler is concerned:
(defun foo ()
  ...some other stuff...
  (let ((a something))
    ...some stuff...))

(defun foo ()
  (flet ((localfun (a)
           ...some stuff...))
    ...some other stuff...
    (localfun something)))

(defun foo ()
  (let ((funvar #'(lambda (a)
                    ...some stuff...)))
    ...some other stuff...
    (funcall funvar something)))
Although local call is most efficient when the function is called only once, a call doesn't have to be equivalent to a let to be more efficient than full call. All local calls avoid the overhead of argument count checking and keyword argument parsing, and there are a number of other advantages that apply in many common situations. See section 5.4.1 for a discussion of the optimizations done on let calls.

5.6.3 Closures

Local call allows for much more efficient use of closures, since the closure environment doesn't need to be allocated on the heap, or even stored in memory at all. In this example, there is no penalty for localfun referencing a and b:
(defun foo (a b)
  (flet ((localfun (x)
           (1+ (* a b x))))
    (if (= a b)
        (localfun (- x))
        (localfun x))))
In local call, the compiler effectively passes closed-over values as extra arguments, so there is no need for you to ``optimize'' local function use by explicitly passing in lexically visible values. Closures may also be subject to let optimization (see section 5.4.1.)

Note: indirect value cells are currently always allocated on the heap when a variable is both assigned to (with setq or setf) and closed over, regardless of whether the closure is a local function or not. This is another reason to avoid setting variables when you don't have to.

5.6.4 Local Tail Recursion

Tail-recursive local calls are particularly efficient, since they are in effect an assignment plus a control transfer. Scheme programmers write loops with tail-recursive local calls, instead of using the imperative go and setq. This has not caught on in the Common Lisp community, since conventional Common Lisp compilers don't implement local call. In Python, users can choose to write loops such as:
(defun ! (n)
  (labels ((loop (n total)
             (if (zerop n)
                 total
                 (loop (1- n) (* n total)))))
    (loop n 1)))

[Macro]
extensions:iterate name ({(var initial-value)}*) {declaration}* {form}*    

This macro provides syntactic sugar for using labels to do iteration. It creates a local function name with the specified vars as its arguments and the declarations and forms as its body. This function is then called with the initial-values, and the result of the call is return from the macro.

Here is our factorial example rewritten using iterate:
    (defun ! (n)
      (iterate loop
               ((n n)
               (total 1))
        (if (zerop n)
          total
          (loop (1- n) (* n total)))))
  
The main advantage of using iterate over do is that iterate naturally allows stepping to be done differently depending on conditionals in the body of the loop. iterate can also be used to implement algorithms that aren't really iterative by simply doing a non-tail call. For example, the standard recursive definition of factorial can be written like this:
(iterate fact
         ((n n))
  (if (zerop n)
      1
      (* n (fact (1- n)))))
5.6.5 Return Values

One of the more subtle costs of full call comes from allowing arbitrary numbers of return values. This overhead can be avoided in local calls to functions that always return the same number of values. For efficiency reasons (as well as stylistic ones), you should write functions so that they always return the same number of values. This may require passing extra nil arguments to values in some cases, but the result is more efficient, not less so.

When efficiency notes are enabled (see section 5.13), and the compiler wants to use known values return, but can't prove that the function always returns the same number of values, then it will print a note like this:
In: DEFUN GRUE
  (DEFUN GRUE (X) (DECLARE (FIXNUM X)) (COND (# #) (# NIL) (T #)))
Note: Return type not fixed values, so can't use known return convention:
  (VALUES (OR (INTEGER -536870912 -1) NULL) &REST T)
In order to implement proper tail recursion in the presence of known values return (see section 5.5), the compiler sometimes must prove that multiple functions all return the same number of values. When this can't be proven, the compiler will print a note like this:
In: DEFUN BLUE
  (DEFUN BLUE (X) (DECLARE (FIXNUM X)) (COND (# #) (# #) (# #) (T #)))
Note: Return value count mismatch prevents known return from
      these functions:
  BLUE
  SNOO
See section 5.11.10 for the interaction between local call and the representation of numeric types.

5.7 Block Compilation

Block compilation allows calls to global functions defined by defun to be compiled as local calls. The function call can be in a different top-level form than the defun, or even in a different file.

In addition, block compilation allows the declaration of the entry points to the block compiled portion. An entry point is any function that may be called from outside of the block compilation. If a function is not an entry point, then it can be compiled more efficiently, since all calls are known at compile time. In particular, if a function is only called in one place, then it will be let converted. This effectively inline expands the function, but without the code duplication that results from defining the function normally and then declaring it inline.

The main advantage of block compilation is that it it preserves efficiency in programs even when (for readability and syntactic convenience) they are broken up into many small functions. There is absolutely no overhead for calling a non-entry point function that is defined purely for modularity (i.e. called only in one place.)

Block compilation also allows the use of non-descriptor arguments and return values in non-trivial programs (see section 5.11.10).

5.7.1 Block Compilation Semantics

The effect of block compilation can be envisioned as the compiler turning all the defuns in the block compilation into a single labels form:
(declaim (start-block fun1 fun3))

(defun fun1 ()
  ...)

(defun fun2 ()
  ...
  (fun1)
  ...)

(defun fun3 (x)
  (if x
      (fun1)
      (fun2)))

(declaim (end-block))
becomes:
(labels ((fun1 ()
           ...)
         (fun2 ()
           ...
           (fun1)
           ...)
         (fun3 (x)
           (if x
               (fun1)
               (fun2))))
  (setf (fdefinition 'fun1) #'fun1)
  (setf (fdefinition 'fun3) #'fun3))
Calls between the block compiled functions are local calls, so changing the global definition of fun1 will have no effect on what fun2 does; fun2 will keep calling the old fun1.

The entry points fun1 and fun3 are still installed in the symbol-function as the global definitions of the functions, so a full call to an entry point works just as before. However, fun2 is not an entry point, so it is not globally defined. In addition, fun2 is only called in one place, so it will be let converted.

5.7.2 Block Compilation Declarations

The extensions:start-block and extensions:end-block declarations allow fine-grained control of block compilation. These declarations are only legal as a global declarations (declaim or proclaim).


The start-block declaration has this syntax:
(start-block {entry-point-name}*)
When processed by the compiler, this declaration marks the start of block compilation, and specifies the entry points to that block. If no entry points are specified, then all functions are made into entry points. If already block compiling, then the compiler ends the current block and starts a new one.


The end-block declaration has no arguments:
(end-block)
The end-block declaration ends a block compilation unit without starting a new one. This is useful mainly when only a portion of a file is worth block compiling.

5.7.3 Compiler Arguments

The :block-compile and :entry-points arguments to extensions:compile-from-stream and compile-file provide overall control of block compilation, and allow block compilation without requiring modification of the program source.

There are three possible values of the :block-compile argument:
nil
Do no compile-time resolution of global function names, not even for self-recursive calls. This inhibits any start-block declarations appearing in the file, allowing all functions to be incrementally redefined.

t
Start compiling in block compilation mode. This is mainly useful for block compiling small files that contain no start-block declarations. See also the :entry-points argument.

:specified
Start compiling in form-at-a-time mode, but exploit any start-block declarations and compile self-recursive calls as local calls. Normally :specified is the default for this argument (see *block-compile-default*.)
The :entry-points argument can be used in conjunction with :block-compile t to specify the entry-points to a block-compiled file. If not specified or nil, all global functions will be compiled as entry points. When :block-compile is not t, this argument is ignored.


[Variable]
*block-compile-default*    

This variable determines the default value for the :block-compile argument to compile-file and compile-from-stream. The initial value of this variable is :specified, but nil is sometimes useful for totally inhibiting block compilation.
5.7.4 Practical Difficulties

The main problem with block compilation is that the compiler uses large amounts of memory when it is block compiling. This places an upper limit on the amount of code that can be block compiled as a unit. To make best use of block compilation, it is necessary to locate the parts of the program containing many internal calls, and then add the appropriate start-block declarations. When writing new code, it is a good idea to put in block compilation declarations from the very beginning, since writing block declarations correctly requires accurate knowledge of the program's function call structure. If you want to initially develop code with full incremental redefinition, you can compile with *block-compile-default* set to nil.

Note if a defun appears in a non-null lexical environment, then calls to it cannot be block compiled.

Unless files are very small, it is probably impractical to block compile multiple files as a unit by specifying a list of files to compile-file. Semi-inline expansion (see section 5.8.2) provides another way to extend block compilation across file boundaries.

5.7.5 Context Declarations

CMUCL has a context-sensitive declaration mechanism which is useful because it allows flexible control of the compilation policy in large systems without requiring changes to the source files. The primary use of this feature is to allow the exported interfaces of a system to be compiled more safely than the system internals. The context used is the name being defined and the kind of definition (function, macro, etc.)

The :context-declarations option to with-compilation-unit has dynamic scope, affecting all compilation done during the evaluation of the body. The argument to this option should evaluate to a list of lists of the form:
(context-spec {declare-form}+)
In the indicated context, the specified declare forms are inserted at the head of each definition. The declare forms for all contexts that match are appended together, with earlier declarations getting precedence over later ones. A simple example:
    :context-declarations
    '((:external (declare (optimize (safety 2)))))
This will cause all functions that are named by external symbols to be compiled with safety 2.

The full syntax of context specs is:
:internal, :external
True if the symbol is internal (external) in its home package.

:uninterned
True if the symbol has no home package.

(:package {package-name}*)
True if the symbol's home package is in any of the named packages (false if uninterned.)

:anonymous
True if the function doesn't have any interesting name (not defmacro, defun, labels or flet).

:macro, :function
:macro is a global (defmacro) macro. :function is anything else.

:local, :global
:local is a labels or flet. :global is anything else.

(:or {context-spec}*)
True when any supplied context-spec is true.

(:and {context-spec}*)
True only when all supplied context-specs are true.

(:not {context-spec}*)
True when context-spec is false.

(:member {name}*)
True when the defined name is one of these names (equal test.)

(:match {pattern}*)
True when any of the patterns is a substring of the name. The name is wrapped with $'s, so ``$FOO'' matches names beginning with ``FOO'', etc.
5.7.6 Context Declaration Example

Here is a more complex example of with-compilation-unit options:
:optimize '(optimize (speed 2) (space 2) (inhibit-warnings 2)
                     (debug 1) (safety 0))
:optimize-interface '(optimize-interface (safety 1) (debug 1))
:context-declarations
'(((:or :external (:and (:match "%") (:match "SET")))
   (declare (optimize-interface (safety 2))))
  ((:or (:and :external :macro)
        (:match "$PARSE-"))
   (declare (optimize (safety 2)))))
The optimize and extensions:optimize-interface declarations (see section 4.7.1) set up the global compilation policy. The bodies of functions are to be compiled completely unsafe (safety 0), but argument count and weakened argument type checking is to be done when a function is called (speed 2 safety 1).

The first declaration specifies that all functions that are external or whose names contain both ``%'' and ``SET'' are to be compiled compiled with completely safe interfaces (safety 2). The reason for this particular :match rule is that setf inverse functions in this system tend to have both strings in their name somewhere. We want setf inverses to be safe because they are implicitly called by users even though their name is not exported.

The second declaration makes external macros or functions whose names start with ``PARSE-'' have safe bodies (as well as interfaces). This is desirable because a syntax error in a macro may cause a type error inside the body. The :match rule is used because macros often have auxiliary functions whose names begin with this string.

This particular example is used to build part of the standard CMUCL system. Note however, that context declarations must be set up according to the needs and coding conventions of a particular system; different parts of CMUCL are compiled with different context declarations, and your system will probably need its own declarations. In particular, any use of the :match option depends on naming conventions used in coding.

5.8 Inline Expansion

Python can expand almost any function inline, including functions with keyword arguments. The only restrictions are that keyword argument keywords in the call must be constant, and that global function definitions (defun) must be done in a null lexical environment (not nested in a let or other binding form.) Local functions (flet) can be inline expanded in any environment. Combined with Python's source-level optimization, inline expansion can be used for things that formerly required macros for efficient implementation. In Python, macros don't have any efficiency advantage, so they need only be used where a macro's syntactic flexibility is required.

Inline expansion is a compiler optimization technique that reduces the overhead of a function call by simply not doing the call: instead, the compiler effectively rewrites the program to appear as though the definition of the called function was inserted at each call site. In Common Lisp, this is straightforwardly expressed by inserting the lambda corresponding to the original definition:
(proclaim '(inline my-1+))
(defun my-1+ (x) (+ x 1))

(my-1+ someval) ==> ((lambda (x) (+ x 1)) someval)
When the function expanded inline is large, the program after inline expansion may be substantially larger than the original program. If the program becomes too large, inline expansion hurts speed rather than helping it, since hardware resources such as physical memory and cache will be exhausted. Inline expansion is called for: In addition to this speed/space tradeoff from inline expansion's avoidance of the call, inline expansion can also reveal opportunities for optimization. Python's extensive source-level optimization can make use of context information from the caller to tremendously simplify the code resulting from the inline expansion of a function.

The main form of caller context is local information about the actual argument values: what the argument types are and whether the arguments are constant. Knowledge about argument types can eliminate run-time type tests (e.g., for generic arithmetic.) Constant arguments in a call provide opportunities for constant folding optimization after inline expansion.

A hidden way that constant arguments are often supplied to functions is through the defaulting of unsupplied optional or keyword arguments. There can be a huge efficiency advantage to inline expanding functions that have complex keyword-based interfaces, such as this definition of the member function:
(proclaim '(inline member))
(defun member (item list &key
                    (key #'identity)
                    (test #'eql testp)
                    (test-not nil notp))
  (do ((list list (cdr list)))
      ((null list) nil)
    (let ((car (car list)))
      (if (cond (testp
                 (funcall test item (funcall key car)))
                (notp
                 (not (funcall test-not item (funcall key car))))
                (t
                 (funcall test item (funcall key car))))
          (return list)))))

After inline expansion, this call is simplified to the obvious code:
(member a l :key #'foo-a :test #'char=) ==>

(do ((list list (cdr list)))
    ((null list) nil)
  (let ((car (car list)))
    (if (char= item (foo-a car))
        (return list))))
In this example, there could easily be more than an order of magnitude improvement in speed. In addition to eliminating the original call to member, inline expansion also allows the calls to char= and foo-a to be open-coded. We go from a loop with three tests and two calls to a loop with one test and no calls.

See section 5.4 for more discussion of source level optimization.

5.8.1 Inline Expansion Recording

Inline expansion requires that the source for the inline expanded function to be available when calls to the function are compiled. The compiler doesn't remember the inline expansion for every function, since that would take an excessive about of space. Instead, the programmer must tell the compiler to record the inline expansion before the definition of the inline expanded function is compiled. This is done by globally declaring the function inline before the function is defined, by using the inline and extensions:maybe-inline (see section 5.8.3) declarations.

In addition to recording the inline expansion of inline functions at the time the function is compiled, compile-file also puts the inline expansion in the output file. When the output file is loaded, the inline expansion is made available for subsequent compilations; there is no need to compile the definition again to record the inline expansion.

If a function is declared inline, but no expansion is recorded, then the compiler will give an efficiency note like:
Note: MYFUN is declared inline, but has no expansion.
When you get this note, check that the inline declaration and the definition appear before the calls that are to be inline expanded. This note will also be given if the inline expansion for a defun could not be recorded because the defun was in a non-null lexical environment.

5.8.2 Semi-Inline Expansion

Python supports semi-inline functions. Semi-inline expansion shares a single copy of a function across all the calls in a component by converting the inline expansion into a local function (see section 5.6.) This takes up less space when there are multiple calls, but also provides less opportunity for context dependent optimization. When there is only one call, the result is identical to normal inline expansion. Semi-inline expansion is done when the space optimization quality is 0, and the function has been declared extensions:maybe-inline.

This mechanism of inline expansion combined with local call also allows recursive functions to be inline expanded. If a recursive function is declared inline, calls will actually be compiled semi-inline. Although recursive functions are often so complex that there is little advantage to semi-inline expansion, it can still be useful in the same sort of cases where normal inline expansion is especially advantageous, i.e. functions where the calling context can help a lot.

5.8.3 The Maybe-Inline Declaration

The extensions:maybe-inline declaration is a CMUCL extension. It is similar to inline, but indicates that inline expansion may sometimes be desirable, rather than saying that inline expansion should almost always be done. When used in a global declaration, extensions:maybe-inline causes the expansion for the named functions to be recorded, but the functions aren't actually inline expanded unless space is 0 or the function is eventually (perhaps locally) declared inline.

Use of the extensions:maybe-inline declaration followed by the defun is preferable to the standard idiom of:
(proclaim '(inline myfun))
(defun myfun () ...)
(proclaim '(notinline myfun))

;;; Any calls to myfun here are not inline expanded.

(defun somefun ()
  (declare (inline myfun))
  ;;
  ;; Calls to myfun here are inline expanded.
  ...)
The problem with using notinline in this way is that in Common Lisp it does more