00 | Previous | Up | Next

Building software building software

Building software building software Episode 1: Macros - source to source transformation.

Next Thursday, March 6th, in the 4'th floor large conference room of building 6, from 12:00 to 13:30, i plan to give the first episode of a series of talks on building software that builds other software out of common household ingredients. This episode is on macros and will cover such topics as

  • If macro's are so useful, why doesn't my language have them?

  • Proper macro hygiene.

  • Guy Steel's wholesale club.

  • Transforming an interpreter into a compiler.

  • A version of Lex in one page of code.

For slides see: http://openmap.bbn.com/~kanderso/building/macro

01 | Previous | Up | Next

Software building software

  • Can be big like:

    • a compiler

    • a preprocessor (Orbix, Objectivity)

    • Yacc, Lex, ...

  • But they can also be small

  • and made form common programming ingredients.

Learn the ancient ways,

Learn the secrets of the experts,

Use them in our programs.

Head toward the future, open implementation, open compilers...

02 | Previous | Up | Next

Lisp is a building material

  • Lisp is not a language, it is a building material. (Alan Kay)

  • Lisp is a programmable programming language. (John Federaro)

  • Extend the language up from the bottom, until it meets the application being written down from the top. (Paul Graham)

Building materials

  • Macros - source to source transformation

  • Closures - crystalized software

  • Partial Evaluation - freeing the essence of computation

  • Metaobjects - Self aware software
03 | Previous | Up | Next

A little Lisp in Java


class Cons 
{
 private Object head;
 private Object tail;

 private Cons(Object a, Object b)
 {
  head = a;
  tail = b;
 }
 public static cons(Object a, Object b)
 {
   return new Cons(a, b);
 }
 public static list1(Object a)
 {
  return cons(a, null);
 public static list2(Object a, Object b)
 {
  return cons(a, list1(b));
}

class Symbol
{
 private String name;
 private Symbol(String theName)
 {
   name = theName;
 }
 public static symbol(theName)
 {
   ... construct an interned symbol.
 }
}
 private
04 | Previous | Up | Next

Software = data

  • Lisp has a simple syntax, similar to parse-trees and thus requires little parsing.

  • Lisp has predefined data structures for its parse-trees called "symbolic expressions" or "s-expressions".
Thus the software fragment:
(setf place (cons what place))

can be represented by a data structure constructed by the Lisp code:

? (list 'setf 'place (list 'cons 'what 'place))
(SETF PLACE (CONS WHAT PLACE))          

Java: 
place = cons(what, place);

So one piece of code generates another piece of code.

05 | Previous | Up | Next

Defining a macro extends the syntax of the language


(defmacro my-push (what place)
  (list 'setf place (list 'cons what place)))

Now my-push can be used when one wants to push an item onto a list:


(defun ok-items (items)
  (let ((result '()))
    (dolist (item items)
      (if (ok? item) 
        (my-push item result)))
    result))
The my-push expression is replaced by its macro expansion:

(defun ok-items (items)
  (let ((result '()))
    (dolist (item items)
      (if (ok? item)
        (setf result (cons item result))))
    result))

06 | Previous | Up | Next

ok-items in Java


Java:
public static Cons okItems(Cons items)
{
  Cons result = null;
  while (items != null)
  {
    Object item = car(items);
    if (isOk(item))
      result = cons(item, result);
    items = cdr(items);
   }
  return result;
}
07 | Previous | Up | Next

Macroexpansion, the details

Here defmacro defines a macro-function that might look something like:

(defun expand-my-push (form env)
  (declare (ignore env))        ; Compile time environment.
  (let ((what (second form))
        (place (third form)))
    (list 'setf place (list 'cons what place))))

(setf (macro-function 'my-push) #'expand-my-push)

Java:
public static Object expand-my-push(Object form, Object env)
{
 Object what = second(form);
 Object place = third(form);
 return list3(symbol("setf"), place, list2(symbol("cons"), what, place);
}

When the Lisp evaluator[1] sees a form beginnng with my-push, it runs its macro-function to get a replacement form, which is then reevaluated. Thus the part of the Lisp evaluator that deals with macroexpansion might look like:

(defun eval (form env)
  (cond ...
        ((and (consp form) 
              (symbolp (car form))
              (macro-function (car form)))
         (eval (funcall (macro-function (car form))
                        form
                        env)))
        ...))
The variable env is the runtime environment that form is to be evaluted in.

[1] Lisp is a compiled language, eval is only used to help describe the semantics of the language. We will see how to turn an interpreter into a compiler later on.

08 | Previous | Up | Next

Backquote simplifies construction of expessions

To construct an expression, we must use list construction primitives: like cons, list, append and quote to represent constant expressions:


(defmacro my-push (what place)
  (list 'setf place (list 'cons what place)))

Alternatively, we can use the backquote macro character "`":


(defmacro my-push (what place)
  `(setf ,place (cons ,what ,place)))
The backquoted expression provides a template for the expression you want to generate.
09 | Previous | Up | Next

Basic idea of backquote


? (let ((y '(a b))) `(y ,@y ,y))
(Y A B (A B))
is equivalent to:

? (let ((y '(a b))) (append (list 'y) y (list y)))
(Y A B (A B))
Specifically:

? (let ((what 'item)
        (place 'result))
    `(setf ,place (cons ,what ,place)))
(SETF RESULT (CONS ITEM RESULT))
10 | Previous | Up | Next

Use hygienic macros

Common Lisp macros are sensitive to their surrounding code:

(flet ((cons (a b) (+ a b)))
  ...
  (my-push item result)
  ...)
Here, the expansion of my-push uses the local definition of cons because cons (and setf) occurs free in the macro expansion.

To avoid this problem, you are not allowed to redefine any symbol in the common-lisp package.

Scheme has developed several hygienic macro approaches [scheme refrences].

  • Dybvig. Writing Hygienic Macros in Scheme with Syntax-Case. Computer Science Department, Indiana University, June 1992, 29pgs. (iucstr356.ps.gz)
  • Hieb, Dybvig and Bruggeman. Syntactic Abstraction in Scheme. Computer Science Department, Indiana University, June 1992 [revised 7/3/92]. (iucstr355.ps.gz)
  • Eugene E. Kohlbecker, Daniel P. Friedman, Matthias Felleisen, and Bruce Duba. Hygienic macro expansion. In Proceedings of the 1986 ACM Conference on Lisp and Functional Programming, pages 151--161.
  • ftp://ftp.ccs.neu.edu/pub/people/wand/papers/popl-87.dvi Eugene~M. Kohlbecker and Mitchell Wand. Macro-by-Example: Deriving Syntactic Transformations from their Specifications. In {\popl{14th}}, pages 77--84, 1987.

Extra credit: What if the flet redefined macroexpand?

11 | Previous | Up | Next

Avoid unintentional variable capture


(defmacro foo (y)
  ;; Compute y + y^2 +y^3
  `(let ((x (* ,y ,y)))
     (+ ,y x (* x ,y))))

? (foo 3)
39
? (let ((a 3)) (foo a))
39
? (let ((x 3)) (foo x))
99

In the third example, the outer binding of x is shadowed the binding provided by the macroexpansion.

The inner binding is not apparent to a reader.

12 | Previous | Up | Next

Solution: Use unique variable names


(defmacro foo (y)
  ;; Compute y + y^2 +y^3
  (let ((x (make-symbol "X")))
    `(let ((,x (* ,y ,y)))
       (+ ,y ,x (* ,x ,y)))))

make-symbol generates a new symbol each time the macro is expanded.

Each binding is unique.


(let ((x 3))
  (foo x))

Expands to:


(let ((x 3))
  (LET ((#:X (* X X)))
    (+ X #:X (* #:X X))))
13 | Previous | Up | Next

However, variable capture can be useful

Example: Anaphoric if:

(defmacro aif (cond then else)
  `(let ((-it- ,cond))
     (if -it- ,then ,else)))

Sample use:


(defun bind (name value bindings)
  ;; Add (name . value) to list of bindings.
  (aif (assoc name bindings)
    (progn (setf (cdr -it-) value)
           bindings)
    (cons (cons name value) bindings)))

Use variable names that are distinctive, like -it-.

14 | Previous | Up | Next

Avoid redundant computation


(defmacro foo (y)                       ; Version 2.
  ;; Compute y + y^2 +y^3
  (let ((x (make-symbol "X")))
    `(let ((,x (* ,y ,y)))
       (+ ,y ,x (* ,x ,y)))))

? pi
3.141592653589793

? (rationalize pi)
245850922/78256779

                                        
? (time (dotimes (i 1000)               ; 10.7 sec.
          (foo (rationalize pi))))      ;  4.6 MBytes

? (time (dotimes (i 1000)               ;  3.7 sec.
          (let ((x (rationalize pi)))   ;  1.3 MBytes
            (foo x))))

The first version calls rationalize four times per call to foo. The second only does one.


(defmacro foo (y)                       ; Version 3.
  ;; Compute y + y^2 +y^3
  (let ((x (make-symbol "X"))
        (z (make-symbol "Z")))
    `(let* ((,z ,y)
           (,x (* ,z ,z)))
       (+ ,z ,x (* ,x ,z)))))
The macro once-only can be helpful here.
15 | Previous | Up | Next

Don't use a macro when you can use a function...

The problems with foo can be avoided simply by using a function:


(defun foo (y)                          ; Version 4.
  ;; Compute y + y^2 +y^3
  (let ((x (* y y)))
    (+ y x (* x y))))
16 | Previous | Up | Next

Compute what you can at compile time: Garnet

Garnet is an object oriented graphics system [REF?]. In Garnet, an object has a 8 element hash table of slots. A slot's value might be computed as:

(defun slot-ref (object name)
  (cdr (or (assoc name (aref object (slot-hash name)))
           (slot-not-found-error object name))))

(defun slot-hash (name) 
  (logand (sxhash name) 7))

The slot-hash computation can be done at compile time if name is a constant:

(defmacro slot-ref (object name)
  (if (constant? name)
    (let ((hash (slot-hash (constant-value name))))
      `(slot-ref-runtime-1 ,object ,name ,hash))
    `(slot-ref-runtime-2 ,object ,name)))

(defun slot-ref-runtime-1 (object name offset)
  (cdr (or (assoc name (aref object offset))
           (slot-not-found-error object name))))

(defun slot-ref-runtime-2 (object name)
  (slot-ref-runtime-2 object name (slot-hash name)))

Unfortunately, while Garnet uses an approach like this, in user code the macro is always exanded in a context where constant? fails, so the optimization has no effect.

17 | Previous | Up | Next

Compute what you can at compile time: keywords

The overhead of keyword argument processing can be removed by using a macro with keywords that calls a normal function.


(defmacro draw-line (x1 y1 x2 y2 &key 
                        (stream *standard-output*) 
                        (ink +foreground+)
                        thickness ...)

  `(draw-line-runtime ,x1 ,y1 ,x2 ,y2 
                      ,stream ,ink ,thickness ...))

(defun draw-line-runtime (x1 y1 x2 y2 
                             stream inc thickness)
  ...)

The disadvantage is that draw-line is not longer a function so it can't be funcall'ed.

Compiler macros can be used to overcome this problem. CLIM uses this approach.

18 | Previous | Up | Next

Specialize functions on constant (static) arguments.

When a function is controlled by a constant argument, a specialized version of the function can be written. Example: mem?


(defun mem?-rt (item items)
  (if (consp items) 
    (if (eql item (car items)) T
        (mem?-rt item (cdr items)))
    NIL))

(defmacro mem? (item items)
  (if (constant? items)
    (c-mem? item (constant-value items))
    `(mem?-rt ,item ,items)))

(defun constant? (x)
  (if (consp x) (eq (car x) 'quote)
      t))

(defun constant-value (x)
  (if (consp x) (second x) x))
  
(defun c-mem? (item items)
  (if (consp items)
    `(if (eql ,item ',(car items)) T
         ,(c-mem? item (cdr items)))
    NIL))

? (macroexpand '(mem? x '(#\space #\tab #\newline)))
(IF (EQL X '#\Space) T (IF (EQL X '#\Tab) T (IF (EQL X '#\Newline) T NIL)))
T
? (macroexpand '(mem? x (cdr y)))
(MEM?-RT X (CDR Y))
T
19 | Previous | Up | Next

Leave only small footprints

As with inlined functions, each macroexpansion takes up space in your application.

Just as inlined functions only make sense if they are small, keep a macro's code footprint small.

Expand a macro into a function call that does most of the work at load time.

See slot-ref for a good example.


20 | Previous | Up | Next

Writing macro writing macros - Comma comma comma quote comma comma

Macro's that write macros can be tricky because they can require nesting of backquotes.

Paul Graham has a nice way to construct such macros.

Example: Abbreviations - some names in Lisp are long.

Step 1: Write out an example of what the macro should do:


(abbrev dbind destructuring-bind) -> 

(defmacro dbind (&rest args)
  "Abbreviation for destructuring bind."
  `(destructuring-bind ,@args))

Step 2: Replace constants with variables:


(defmacro dbind (&rest args)
  (let ((name 'destructuring-bind))
    `(,name @,args)))

Step 3: Make a template out of the defmacro [single comma form].


(defmacro abbrev (short long)
  `(defmacro ,short (&rest args)
     (let ((name ',long))
       `(,name @,args))))

Step 4: Simplify by removing the let binding [optional].


(defmacro abbrev (short long)
  `(defmacro ,short (&rest args)
     `(,',long @,args)))
21 | Previous | Up | Next

Guy Steele's wholesale club

Double backquotes lead to comma expressions that can be recognized wholesale [CLTL2 p. 530].

,x
The value of x during the second evaluation.
,',x
The value of x during the first evaluation (most common).
,@',x
The value of x during the first evaluation spiced in (most common).
,,x
The value of the value of x.
22 | Previous | Up | Next

Common Lisp is 10% macros (MCL 3.1)

Usage of symbols
special-form 26
macro 90
function 754
variable 117
total 977

AMP defines 906 macros.

23 | Previous | Up | Next

Usage of macros (MCL 3.1)

x
Type
Size       .
Contents
control
14
and cond multiple-value-bind multiple-value-list multiple-value-setq nth-value or prog prog* prog1 prog2 return unless when
case
6
case ccase ctypecase ecase etypecase typecase
loop
8
do do* do-all-symbols do-external-symbols do-symbols dolist dotimes loop
definition
19
declaim defclass defconstant defgeneric define-compiler-macro define-condition define-declaration define-method-combination define-modify-macro define-setf-method defmacro defmethod defpackage defparameter defsetf defstruct deftype defun defvar
local definition
3
generic-flet generic-function generic-labels
side-effect
11
decf incf pop psetf psetq push pushnew remf rotatef setf shiftf
convenience
7
handler-bind handler-case lambda destructuring-bind print-unreadable-object restart-bind restart-case
with
13
with-accessors with-compilation-unit with-condition-restarts with-hash-table-iterator with-input-from-string with-open-file with-open-stream with-output-to-string with-package-iterator with-simple-restart with-slots with-standard-io-syntax
misc
5
formatter ignore-errors in-package pprint-logical-block
debugging
6
assert check-type step time trace untrace
24 | Previous | Up | Next

Build your own language: Collecting

Collecting is a common idiom:

(defun collect-macro-symbols ()
  (let ((result '()))
    (do-symbols (s 'common-lisp)
      (when (macro-function s) (push s result)))
    result))

A collecting macro lets one separate collecting from iteration:

(defun collect-macro-symbols ()
  (collecting
   (do-symbols (s 'common-lisp)
     (when (macro-function s) (collect s)))))
25 | Previous | Up | Next

Collecting implementation

Thus we want the macro to look something like:

`(let ((.result. '()))
  (macrolet 
    ((collect (x) 
       `(setq .result. (cons ,x .result.))
       ,@body)
     .result.))

collecting would be more useful if we could use other collection strategies besides cons, such as, append, or adjoin.

Issue: Should a user have access to .result. during iteration? Common Lisp suggests, "no".

(defmacro collecting (how &body body)
  (let ((how (if (consp how) how `(,how))))
    `(let ((.result. '()))
       (macrolet 
         ((collect (x) 
            (let ((name ',(car how))
                  (args ',(cdr how)))
            `(setq .result. (,name ,x .result. ,@args)))))
         ,@body)
       .result.)))

Removing the innermost let produces:

(defmacro collecting (how &body body)
  (let ((how (if (consp how) how `(,how))))
    `(let ((.result. '()))
       (macrolet 
         ((collect (x) 
            `(setq .result. 
                   (,',(car how) ,x .result. ,@',(cdr how)))))
         ,@body)
       .result.)))
26 | Previous | Up | Next

Build your own language: Define

Scheme has a defining form, define that defines variables and functions, and thus subsumes defparameter and defun.


(define x 3) => (defparameter x 3)

(define (f x) (+ x 3)) => (defun f (x) (+ x 3))

You can also define "curried" functions [See next episode].

So if the uncurried version looks like:


(define (f x y) (+ x y))

The curried version would look like:


(define ((f x) y) (+ x y))
  => (define (f x) (lambda (y) (+ x y)))
Here's the macro:

(defmacro define (form &body body)
  (cond ((consp form)
         (cond ((consp (car form))
                `(define ,(car form) 
                         (lambda ,(cdr form) ,@body)))
               (t `(defmethod ,(car form) ,(cdr form)
                              ,@body))))
        (t `(defparameter ,form ,(car body)))))

This version also subsumes defmethod.

Using such a defining macro lets us control such things as:

  • Documentation.
  • Optimization.
  • Portability.
  • Implementation details.
  • Changes to the underlying language.
27 | Previous | Up | Next

Building a data abstraction

A simple record facity like that in Elements of Programming Languages.

(define-variant name &rest slots)

SLOTS is a list of slot name symbols. Accessor macros <name>-<slot> are created. A predicate <name>? and constructor make-<name> with arguments SLOTS are created.

The predicate(variant? x) identifies a variant instance of any kind.

Each variant has a unique, unforgable variant-ID. Each variant instance is a vector with the variant-id as the initial element.


(let ((variant-id (cons 'v nil))
      (variants '()))                   ;  Alist of id's
  (defun variant? (x) (eq (cdr (aref x 0)) variant-id))

  (defun find-variant-id (name)
    (assoc name variants :test #'eq))

  (defun make-variant (name slots)
    (or (find-variant-id name)
        (let ((v (cons name variant-id)))
        (push v variants)
        v))))

(eval-when (:compile-toplevel
            :load-toplevel
            :execute)
  (defun make-name (&rest args)
    (intern (apply #'concatenate 'string (mapcar #'string args))))
  )
The function make-name is needed at compile time because it is used by define-variant. It could be flet'd inside the macro to avoid this.
28 | Previous | Up | Next

Example variant application

Elements of a Lisp program might be represented by the following variants:


(define-variant lambda n-args body)
(define-variant variable name path)
(define-variant global name value)
(define-variant constant value)
(define-variant primitive function)
(define-variant application function args)

Thus, the expression


(let ((m 3)
      (b -2)
      (x 4))
  (+ (* m x) b))

could be represented as:


#((APPLICATION V)
  #((LAMBDA V) 3
    #((APPLICATION V) #((PRIMITIVE V) 
                        #<Compiled-function + #xC854D6>)
      (#((APPLICATION V) #((PRIMITIVE V) 
                           #<Compiled-function * #xC617E6>)
         (#((VARIABLE V) M 0) #((VARIABLE V) X 2)))
       #((VARIABLE V) B 1))))
  (#((CONSTANT V) 3) #((CONSTANT V) -2) #((CONSTANT V) 4)))
29 | Previous | Up | Next

Running this representation

Something like this could execute (run) this representation:


(defun run (exp exp)
  (variant-case exp
    ((constant value) value)
    ((variable name path) (lookup-variable env name path))
    ((global value) value)
    ((primitive) exp)
    ((lambda) exp)
    ((application function args)
     (let* ((f (run function env))
            (env (extend-env env args)))
       (variant-case f
         ((primitive) (apply-primitive f env))
         ((lambda n-args body) 
          (check-n-args n-args env)
          (run body env)))))))

The variant-case macro provides

  • Type dispatch.

  • Destructuring.

  • Implementation hiding (nothing revealed).
30 | Previous | Up | Next

The variant-case macro


(defmacro variant-case (variable &body arms)
  (flet
    ((variant-arm (arm)
       (destructuring-bind ((name . args) . body) arm
         (let ((test (make-name name "?"))
               (var-vals (mapcar 
                          #'(lambda (a)
                              `(,a (,(make-name name "-" a) .v.)))
                          args)))
           `((,test .v.) 
             (let ,var-vals
               ,@body))))))
    `(let ((.v. ,variable))
       (cond ,@(mapcar #'variant-arm arms)
             (t (variant-error .t.))))))
Macroexpansion looks like:

(LET ((.V. EXP))
  (COND ((CONSTANT? .V.)
         (LET ((VALUE (CONSTANT-VALUE .V.)))
           VALUE))
        ((VARIABLE? .V.)
         (LET ((NAME (VARIABLE-NAME .V.))
               (PATH (VARIABLE-PATH .V.)))
           (LOOKUP-VARIABLE ENV NAME PATH)))
        ((GLOBAL? .V.) 
         (LET ((VALUE (GLOBAL-VALUE .V.)))
           VALUE))
        ...
        (T (VARIANT-ERROR .V.))))
31 | Previous | Up | Next

With-* macro example: HTML generation

A with-* macro provides a safe environment that make sure things finish or are undone.

This code printed these slides[cl-http]:


(defmethod make-slides ((g 1-file-grinder) title)
  "Generate the slides."
  (let* ((slides (slides g))
         (file (grinder-file g))
         (N (length slides))
         (i 0))
    (with-open-file (stream file :direction :output
                            :if-exists :supersede
                            :if-does-not-exist :create)
      (let ((*standard-output* stream)
            (*output-stream* stream))
        (with-html-document ()          ; <HTML></HTML>
          (with-document-preamble ()    ; <HEAD></HEAD>
            (declare-title title)       ; <TITLE></TITLE>
            (with-document-body ()      ; <BODY></BODY>
              (dolist (s slides)
                (make-slide g s i N)
                (setq i (+ i 1))))))))))

(defmethod make-slide ((g grinder) slide i N)
  "Make a slide numbered I of N."
  (with-table                           ; <TABLE width=1.0
    (:width 1.0 :border t)              ; border=1></TABLE>
    (with-table-row ()                  ; <TR></TR>
      (with-table-cell ()               ; <TD></TD>
        (make-slide-buttons g i N)
        (with-centering ()              ; <CENTER></CENTER>
          (with-section-heading         ; <H2></H2>
            ((slide-title slide) :level 2)))
        (mapc #'write-line (slide-contents slide))))))
32 | Previous | Up | Next

Transforming an interpreter into a compiler

When you want a compiler, start with an interpreter.

  • Easier to write

  • Get the semantics right.

  • Evolve into a compiler through meaning preserving transformations.

  • Sometimes such transformations can be automatic.
33 | Previous | Up | Next

Example: Regular expression matching

The function apropos is useful, but it does not allow for regular expression matching.

Let's add UNIX "glob" style matching where

? matches any character

* matches 0 or more characters.

\ removes any special meaning from the following character.

Here's some examples:


? (my-apropos "MOST-*-SINGLE-FLOAT")
(MOST-NEGATIVE-SINGLE-FLOAT MOST-POSITIVE-SINGLE-FLOAT)

? (my-apropos "\\*PRINT-*Y\\*")
(*PRINT-PRETTY* *PRINT-ARRAY* *PRINT-READABLY*)

? (my-apropos "DEF*MACRO*")
(DEFMACRO DEFINE-COMPILER-MACRO DEFINE-MODIFY-MACRO)
34 | Previous | Up | Next

Converting a string into a pattern


(defun my-apropos (pattern &optional (package 'common-lisp))
  (collecting cons 
    (let ((p (string->pattern pattern)))
      (do-symbols (s package)
        (when (glob-1 p (symbol-name s))
          (collect s))))))

The function string->pattern converts the string from of a pattern into a list form which is easy to operate on:


? (string->pattern "MOST-*-SINGLE-FLOAT")
(#\M #\O #\S #\T #\- * #\- 
 #\S #\I #\N #\G #\L #\E #\- #\F #\L #\O #\A #\T)
? 
(string->pattern "\\*PRINT-*Y\\*")
(#\* #\P #\R #\I #\N #\T #\- * #\Y #\*)
? 

Basically, the string is converted into a list of characters except that the special characters #\* and #\? are converted into the symbols '* and '?.


(defun string->pattern (string &optional (start 0) 
                               (end (length string))
                               (so-far '()))
  (if (>= start end) (reverse so-far)
      (let ((c (aref string start)))
        (if (eql c #\\) 
          (let ((start (+ start 1)))
            (if (>= start end)
              (reverse (cons c so-far))
              (let ((c (aref string start)))
                (string->pattern string (+ start 1) end 
                                 (cons c so-far)))))
          (string->pattern string (+ start 1) end 
                           (cons (case c
                                   (#\? '?)
                                   (#\* '*)
                                   (t c))
                                 so-far))))))
35 | Previous | Up | Next

Version 1: a tail recursive interpreter

glob-1 does the acutal matching:


(defun glob-1 (pattern string &optional (start 0) (end (length string)))
  (if (null pattern)                    ; Done?
    (not (< start end))                 ; Done & at end?
    (let ((head (car pattern)))
      (cond ((eq head '?)
             (and (< start end)
                  (glob-1 (cdr pattern) string (+ start 1) end)))
            ((eq head '*)
             (if (null (cdr pattern)) T         ; Skip to end.
                 (or 
                  (glob-1 (cdr pattern) string start end)       ; [1]
                  (and (< start end)
                       (glob-1 pattern string (+ start 1) end))))) ; [2]
            (t (and (< start end) 
                    (eql head (aref string start))
                    (glob-1 (cdr pattern) string (+ start 1) end)))))))

This structure is typical of an interpreter, a case dispatch controlled by a static parameter.

[1]This is the only nontail recursive call to glob-1

[2]This is a self call that can be replaced by a loop.

36 | Previous | Up | Next

Version 2: Replace self call by loop


(defun glob-1 (pattern string &optional (start 0) (end (length string)))
  (if (null pattern)                    ; Done?
    (not (< start end))                 ; Done & at end?
    (let ((head (car pattern)))
      (cond ((eq head '?)
             (and (< start end)
                   (glob-1 (cdr pattern) string (+ start 1) end)))
            ((eq head '*)
             (if (null (cdr pattern)) T         ; Skip to end.
                 (loop 
                   (if (glob-1 (cdr pattern) string start end)
                     (return t)
                     (if (< start end)
                       (setq start (+ start 1)) 
                       (return NIL))))))
            (t (and (< start end) 
                    (eql head (aref string start))
                    (glob-1 (cdr pattern) string (+ start 1) end)))))))

Replace each call to glob-1 with a call to a specialized, inlined function.

The pattern gets shorter each time so there is no infinite recursion.

37 | Previous | Up | Next

Version 3: A pattern compiler


(defun glob (pattern)
  (if (null pattern)                    ; Done?
    `(not (< start end))                 ; Done & at end?
    (let ((head (car pattern)))
      (cond ((eq head '?)
             `(and (< start end)
                   (let ((start (+ start 1)))
                     (progn start)      ; Mention to compiler.
                     ,(glob (cdr pattern)))))
            ((eq head '*)
             (if (null (cdr pattern)) T         ; Skip to end.
                 `(loop 
                    (if ,(glob (cdr pattern))
                      (return t)
                      (if (< start end)
                        (setq start (+ start 1)) 
                        (return NIL))))))
            (t `(and (< start end) 
                     (eql ,head (aref string start))
                     (let ((start (+ start 1)))
                       (progn start)      ; Mention to compiler.
                       ,(glob (cdr pattern)))))))))

This is not a macro, but a code generating function.

It can be used in a macro or to generate a compiled function.

We have transformed an interpeter into a compiler.

38 | Previous | Up | Next

My-apropos Version 2


(defun glob-compile (pattern)
  (compile nil 
           `(lambda (string &optional (start 0) (end (length string)))
              ,(glob pattern))))

(defun my-apropos-2 (pattern &optional (package 'common-lisp))
  (let ((m (glob-compile (string->pattern pattern))))
    (collecting cons 
      (do-symbols (s package)
          (when (funcall m (symbol-name s))
            (collect s))))))

This version is about 50% faster.

Extra Credit: The loop case can give rise to a redundant (< start end) check. Modify glob to optimize this.

Extra Credit:Extend glob to generate C or Java code.

39 | Previous | Up | Next

A language for lexical analysis

Lex is a well known lexical analyzer generator. It generates a tokenizer from a set of rules. Each rule has two parts, a pattern to be matched, and an action to be executed when the pattern is matched.

My problems with Lex:

  • Need a manual to use it.

  • Can't find the manual.

  • Manual just a bit too large.

  • Actions tend to redo some of the work done in the matching phase.

A solution: Write a tiny Lex, that even i can remember how to use.

Advantages:

  • Lex fits on one page of code.

  • Actions and matching can be intermixed.

  • Tokenizers are at least as compact as those written using Lex.

Disadvantages:

  • There is no backtracking. Patterns must be written so they work with one character lookahead.

  • This is not a big limitation in practice.
40 | Previous | Up | Next

Henry Baker's Meta parser

Henry G. Baker, Pragmatic Parsing in Common Lisp; or, putting defmacro on Steroids, Lisp Pointers, IV, 2, 1991, p. IV-2.3 - IV-2.15.

(match-it pattern) returns T if characters read using meta-read-char and meta-peek-char match the pattern, . The pattern can contain expressions that perform side effects to get some real work done, such as parsing.

Pattern language:

char
matches that character, input is advanced.
variable
matches that variable, input is advanced.
(class test)
satisfies the function named test input is advanced. The variable it is bound to the matching character.
lisp expression
the lisp expression is executed, and matches if it returns nonnil
(seq pattern ...)
matches if all patterns match sequentially.
(either pattern ...)
Greedily persues the first pattern matching the current input character. Matches if that pattern matches. There is no backtracking.
(opt pattern)
same as (either pattern (seq))
(many pattern)
matches 0 or more occurances of pattern
41 | Previous | Up | Next

Example: Read numbers and identifiers separated by spaces

23 Lines macroexpands into 70.


(let ((b (make-array 300 :element-type 'character :fill-pointer t)))
  (defun read-token-meta (s)
    (let ((*standard-input* s)
          (neg? nil)
          (n 0)
          (eneg? nil)
          (e 0)
          (scale 1.0))
      (match-it 
       (seq
        (many (class space-char?))
        (either
         (seq 
          (opt (either #\+ (seq #\- (setq neg? t))))
          (class digit-char-p)
          (setq n (dd+ it n))
          (many (seq (class digit-char-p)
                     (setq n (dd+ it n))))
          (opt (seq #\. 
                    (many (seq (class digit-char-p)
                               (setq scale (* scale 0.1))
                               (setq N (+ N (* (digit-char-p it) 
                                               scale)))))))
          (opt (seq (either #\e #\E)
                    (opt (either #\+ (seq #\- (setq eneg? t))))
                    (many (seq (class digit-char-p)
                               (setq e (dd+ it e))))))
          (let ((e (if eneg? (- e) e)))
            (if (zerop e) (if neg? (- n) n)
                (let ((n (* n (expt 10.0 e))))
                  (if neg? (- n) n)))))
         (seq (class alpha-char-p)
              (bpush it (bclear b))
              (many (seq (class alphanumericp) (bpush it b)))
              (intern b))))))))

?  (read-token-meta *standard-input*) -12.345e-2
-0.12345
?   (read-token-meta *standard-input*)   frobnify
FROBNIFY
42 | Previous | Up | Next

Auxilliary functions


(defun dd+ (d N) (+ (* N 10) (digit-char-p d)))

(defun space-char? (c) (or (eql c #\space) 
                           (eql c #\tab)
                           (eql c #\newline)))

(defun bclear (b) 
  (setf (fill-pointer b) 0)
  b)

(defun bpush (c b)
  (vector-push (char-upcase c) b)
  b)
43 | Previous | Up | Next

The Meta Parser


(defmacro match-it (form)
  (labels
    ((compile-it (form)
       (typecase form
         (cons form
               (case (car form)
                 (seq `(and ,@(mapcar #'compile-it (cdr form))))
                 (either `(or  ,@(mapcar #'compile-it (cdr form))))
                 (opt (compile-it `(either ,(cadr form) (seq))))
                 (many `(loop (if (not ,(compile-it (cadr form)))
                                (return T))))
                 (class `(when (,(cadr form) (meta-peek-char))
                           (setq it (meta-read-char))
                           t))
                 (t form)))
         (t `(when (eql (meta-peek-char) ,form)
               (meta-read-char))))))
    `(let ((current-char (peek-char nil))
	   (it NIL))		; The character just matched.
       (macrolet
         ((meta-peek-char () `current-char)
          (meta-read-char ()
            `(prog1 (read-char)
	       (setq current-char (peek-char nil)))))
         it
	 ,(compile-it form)))))
44 | Previous | Up | Next

Tiny lexical analyzers

The macros required took about as much space as the documentation.

Here are the number of lines for 3 similar lexers on the same problem, and a Lex parser:

Lines of code
Support Lex Tokenizer Total Description
Meta 8 26 36 70 Non backtracking matching languge, parsing done by side effect of global variables.
rcase 13 28 38 78 Uses tail recursive functions as states (goto's with arguments).
MLEX 21 41 36 98 Like Meta, but use constituent stack so they need not be named.
JavaLex 3400 A version of Lex in Java. Comments and lines matching "^[\b\t{}();]*$" were removed to account for variation in coding style.

The rcase macros let one write a tokenizer for CORBA IDL that is the same size as Sun's written with Lex.

Extra Credit: Write a macro that takes a list of tokens like '("<" "<=" ">" ">=" "<<" ">>" "+" "+=" ...) and produces a recognizer for them in this language. Hint: Since the language does not backtrack, generate the software so no backtracking is necessary.

45 | Previous | Up | Next

Why don't other languages have macros?

  • C and C++ have #define macros that provide only textural argument substitution. Extra parens must be used to keep substitutions syntactically correct. Substantial syntactic extensions require a preprocessor like those provided by Orbix or Objectivity.

  • M4, a more powerful macro package, provides limited computational abilities, but must avoid syntax clashes with a language that uses it.

  • Java does not have macros because they are too complicated.

  • C++ templates can be thought of as object oriented macros. Can generate specialized functions but not new syntax.

  • Very clever use of templates can produce code at compile time [Blitz++].

  • Scheme uses hygenic macros that provide safety and power. However, there is no standard macro facility. [scheme refrences]

  • Dylan has syntactic macros which are currently proprietary. CLIM macros have been used to test it.

  • A C syntactic macro facility brings C up to 1970's Lisp macro technology [C macro paper] [Scheme implemntation].
46 | Previous | Up | Next

Aspects of macroexpansion

AspectLisp C with macros
Source programming language Lisp C
How to recognize a macro car of form names a macro-function Syntactic element names a macro
How to recognize (match) arguments. (destructuring-bind) Regular expression of (meta)typed arguments
Expansion language. Lisp + backquote Interpreted C subset + backquote
Result of expansion Cons, symbols, numbers, strings Abstract syntax trees (AST)
Produce code from expansion Lisp compiler AST -> C generator
  • Syntax complicates the issue of macro argument recognition and result construction.
  • AST types include:
    • decl - declaration
    • stmt - statement
    • id - identifier
    • num - number
    • type_spec - type specifier
47 | Previous | Up | Next

C macro examples: Exception handling

An exception handling system needs:

  1. A method for establishing a handler (catch).

  2. A method for invoking a handdler (throw).

  3. A method for carefully unwinding the stack (unwind_protect).
syntax stmt throw {| $$exp::value ; |}
{if (simple_expression(value))
   return(
    `{if (exception_pointer == NULL)
       error("No handler for %d", $value);
       else longjump(exception_ptr, $value);});
   else
     return(
      `{int the_value = $value;
        if (exception_ptr == NULL)
          error("No handler for %d", the_value);
        else longjmp(exception_ptr, the_value);});}

syntax stmt catch
  {| $$exp::tab { $$stmt::handler } { $$stmt::body } |}
  {return(
     `{int *old_exception_ptr = exception_ptr;
       int jmp_buf[2], result;
       result = setjump(jmp_buf);
       if (result == 0)
         {exception_ptr = jmp_buf; $body}
       else {exception_ptr = old_exception_ptr;
             if (result == $tag)
             $handler else throw result;}});}

syntax stmt unwind_protect
  {| { $$stmt::body } { $$stmt::cleanup } |}
  {return(
    `{int *old_exception_ptr = exception_ptr;
      int jmp_buf[2];
      int result = setjump(jmp_buf);
      if (result = 0)
        {exception_ptr = jmp_buf; $body}
      exception_ptr = old_exception_ptr;
      $cleanup;
      if (result != 0) throw result;});}

48 | Previous | Up | Next

Example usage



/* myenum is a macro too. */
myenum error_types {division_by_zero, file_closed, using_unix};

int a, b;
int *c;

{int z, *y;
 z = a + b;
 catch division_by_zero
   {printf("%s", "You lose, division by zero.");}
   {*c = frob(z, a);}
 unwind_protect 
    {start_faucet_running();}
    {stop_running_faucet();}
 return(z);}   
                   

49 | Previous | Up | Next

Additional topics and resources

Topics

  • Compiler macros - Macros that run after everything else, just before compiling. Can override a function definition.

  • Macrolet - Lexically scoped macros.

  • Symbol-macrolet - A symbol can be a no argument macro. Used in with-slots, for example.

  • &enviornment - argument to macros providing Lisp's environment.

  • Code walking - Fancy macro's may need to analyze forms.

  • Macro characters - Lisp's reader is programmable too. Can be used for lexing tasks.

Resources

  • Paul Graham's Book "On Lisp". 13 chapters on macros!

  • Scheme repository has many macro variations, and useful tools such as pattern matching.

  • Scheme is also a source for hygienic macros.

  • R. Kent Dybvig, The Scheme Programming Language, second edition, Prentice Hall, 1996.

  • Stephen Carl's thesis: Syntactic Exposures - A Lexically-Scoped Macro Facility for Extensible Compilers. This approach works with languages with syntax, like Java.
50 | Previous | Up | Next

To sum up

  • Macros are the second oldest language, invented just after assembly language.

  • Extend syntax of the langauge

  • Lets you control how language is compiled.

  • Use proper hygiene.

  • Lisp's lack of syntax simplifies macros.

  • Good news: hygienic macros in a language with syntax are possible.
51 | Previous | Up | Next

Next time: Closures - crystalized software

  • Programming with less

    • No assignment.

    • No return.

    • Functions of only 1 argument.

    • No variables

  • But getting more:

    • Recipe for curried functions.

    • Conses that don't cons their arguments.

    • Interpreting without an interpreter.

    • Compiling without a compiler.

    • Replacing failure with a stream of successes.

    • Building parsers out of other parsers.