Compiling in SILK

This document describes how s-expressions are compiled an executed in SILK. The explaination is a bit tricky because a fair number of classes are involved. Also, i'll switch between examples in SILK and examples in Java without notice.

Basically, Scheme's read-eval-print loop must read and s-expression and evaluate it (determine its value). To evaluate an s-expression, SILK first compiles it into an instance of the abstract class Code. That code object is then run to determine its value.

An example

To start simple, here we compute the code that will compute the number 5:

> (define eng (Engine))
eng
> (define env (interaction-environment))
env
> env
silk.GlobalEnvironment@ab7e589d
> (define code (Code.toCode 5 env eng))
> code
{silk.Code$14: 5}
> 

The code for 5 is an instance of silk.Code$14 an anonymous class. To compile it, Code.toCode() takes 3 arguments,

Here's how to run the code:

> (define frame (get-static Frame.class 'EMPTY))
frame
> frame
silk.Frame@7b5e589e
> (run eng code frame)
5
> 

A Frame is the runtime analog of an environment. It contains values of lambda bound variables. Frames are nested as lambda's are.

When recursion is iteration

Scheme implementations are required to be properly tail-recursive. For example, the following function defines an iteration written tail recursively. For example (f x) iterates until x is zero:

> (define (f x) (if (= x 0) #f (f (- x 1))))
f
> (time (f 100000))   ; Doesn't blow stack!
(#f (3435 msec) (309752 bytes))
> 

The following shows tail versus no tail call sites are in scheme expressions. (call) represents a site where the call frame stack must be extended, while (tail-call) represents where a stack frame can be reused. At least, this is what SILK does.

(begin
  (call)
  ...
  (tail-call))

(if (call)
  (tail-call)
  (tail-call))

((call) (call) ...)

((lambda (...)
  (tail-call ...))
 ...)

Engine and Code work together to handle tail recursion optimization

Unfortunately, Java does not not support this tail call optimization. This is done by having the Engine and Code classes work together. Here is what the relavant parts of the Engine class look like:

public class Engine extends SchemeUtils {

  Code code;
  Frame frame;

  public Object tailCall(Code code, Frame frame) {
    this.code = code;
    this.frame = frame; 
    return this;
  }

  public Object run(Code c, Frame f) { 
    this.tailCall(c, f);
    return this.run(); }

  public Object run() {
    // Loop over the tail calls until you get a result.
    Object result = this;
    while (result == this)
      result = this.code.run(this.frame, this);
    return result; }
  ...
}

To determine the value of a Code fragment, with respect to a Frame, do:

eng.run(code, frame)

This gets eng running in a while loop, but it passes the buck to code passing both frame and itself, this.

The code.run(frame, eng) call can either return a value, or ask eng to run more code withotu growing the stack via:

  return eng.tailCall(newCode, newFrame);
Returning a call to tailCall() handles tail recursion properly, without consing. But it does require an extra eng argument to many functions wether it is used or not.

Here's an example from the code fragment for (if).

    new Syntax("if", v, 2, 3) { 
      public Code code(final Pair x, Environment e, Engine eng) {
	Object args = x.rest;
	final Object Thencode = second(args);
	final Object Elsecode = third(args);
	final Code a = toCode(first(args), e, eng);
	final Code b = toCode(Thencode, e, eng);
	final Code c = toCode(Elsecode, e, eng);

	return new Code(x) {
	  Object run(Frame f, Engine eng) { 
	    Object test = eng.run(a,f);
	    if (test==FALSE) return eng.tailCall(c, f);
	    else return eng.tailCall(b, f);
          }}; // tail recursive
      }};

Given an expression like (if a b c) The Syntax.code() method first calls Code.toCode() on a, b, and c. It then constructs a new Code that when run first runs a to determine the value of the test. Then based on the test, it does a tail call on either b or c.

Procedure application

The class Procedure represents procedures. You apply a procedure using the method:

public Object apply(Pair args, Engine eng)

The Engine is required for applying an instance of class Closure, which winds up tail calling its code. Here is the relavant portion of the class:

package silk;

public class Closure extends Procedure {

  Code body;
  Frame frame;
    
  public Closure (Code body, Frame frame) { 
    this.body = body; this.frame = frame; }

  /** Apply a closure to a list of arguments. Do this by creating
   * a new Frame containing the arguments, and then evaluating the body 
   * of the closure in that new frame. Note: applying a closure, results
   * in a Code object, which must then be evaluated. This is needed
   * to meet the Last Call Optimization requirement of R5RS.
  **/
   public Object apply(Pair args, Engine eng) {
     return eng.tailCall(body, frame.extend(args)); }
}

If a Java method is applying a procedure in tail position, it can invoke the procedure's apply method. This can cause the engine to be returned as the value of the apply. The enclosing engine will then resume the computation of the actual value.

If a Java method needs the value of an application in non tail position it should use:

eng.apply(proc, args);
Which will run the application in a loop if necessary. See examples in Primitive.java. Only one apply() is not in tail position.

Syntax

Scheme has only a few syntactic forms and the Code.toCode() method must deal with each of them.

  public static Code toCode(Object x, Environment e, Engine eng) {
    boolean isPair = (x instanceof Pair);
    Syntax s;
    Object first = null;

    if (isPair && ((first = first(x)) instanceof Symbol)
        && (e != null)
	&& ((s = e.getSyntax((Symbol)first)) != null)) {// Syntax
      return s.code((Pair)x, e, eng);
    } else if (isPair) {                             // Call
      if (debugging) return codeCallDebug((Pair)x, e, eng);
      else  return codeCall((Pair)x, e, eng);
    } else if (x instanceof Symbol) {                // Variable
      return e.codeRef((Symbol)x, 0);
    } else {                                         // Constant
      return codeQuote(x, x);
    } 
  }
If an s-expression begins with a symbol naming a syntactic form the corresponding Syntax object has its code() method invoked on it to produce the appropriate code. Since users can add new syntax using (define-syntax), this method must take an Engine argument.

Calling back to Scheme

To invoke a Scheme procedure from Java, do:

  Procedure proc = ...;
  Pair args = ...;
  Object result = (new Engine()).apply(proc, args);

You may reuse an existing Engine if it will not be run by more than one thread at a time.

Results

The original SILK 1.0 used a simpler interpreter. In SILK 2.0, it was replaced by a threaded code compiler on which the current one is based. It's main difference is that each tail call required consing. The current approach avoids that but at the expense of requiring an extra argument to Code.run() and Procedure.apply(). Both approaches also have the overhead of an extra method invocation when running a code fragment.

The following, admittedly simple, benchmark was used:

(define (f x) (if (= x 0) #f (f (- x 1))))

(time (f 1000000))
Here are the rsults for a 266MHZ Pentium pro, ale.bbn.com:
What      JDK 1.2       Relative  JDK 1.1.6  Relative   
          time  consing Slowness  time       Slowness       
Silk 2.0  75.318 309     1.00     38.666      1.00
codeBegin                         35.511      0.92
final     30.454 161     0.40     24.246      0.63

The times are in seconds. Consing is the number of bytes consed per iteration. Consing has been reduced by 50% and execution time reduced by 40 to 60%. JDK 1.1.6 is about 25% faster than JDK 1.2 on this benchmark.

During development, a bug was discovered in Code.codeBegin() which accounted for about a 10% improvement.