00 | Previous | Up | Next

Building software building software, Episode 2:
New behavior from old

In a functional programming language, a function can take functions as arguments and return a new function as its value. Thus, it is a simple compiler. It can compose new behavior from old, without waiting for a programmer. Unfortunately, this powerful feature is not directly available in recent object-oriented languages such as C++ and Java. However, by combining object-oriented and functional styles we can construct new optimized behavior at runtime.

In this talk, we will develope 3 compilers using this approach:

  1. A threaded byte-code compiler for a lazy functional lanaguage requiring only two byte-codes beyond a few primitive operations.
  2. A relatively high performance closure compiler for mathematical expressions that could be use in Genetic Programming.
  3. A query optimizer for a relational language like SQL or LogWeb.
Functional programmming has been described as "programming with less to get more". Here is a quiz to help you get prepared. Please bring your results with you.

In a language of your choice (the base language), consider the task of writing a function, sum_odd, that sums the odd integers in a list [Java Example] [Lisp Example].

Your task it to write sum_odd several different ways:

  1. without using goto's.
  2. assigning to variables only once.
  3. using no local variables, only constants, global variables or function (method) names.
  4. without returning a value.
  5. using only functions (or methods) with 1 argument.
  6. without using if. Hint: Redefine true and false appropriately.
  7. all of the above.

Ground rules: You may extend the base language, using any feature of the language, to provide new primitives in which sum_odd can be written. You can even define your own language, as long as you provide a working interpeter. However, you must write sum_odd according to the rules above.

01 | Previous | Up | Next

Problem: Let users program the application

Often, we need to give the user some control over the application.

TPEDIT, LAD
User builds SQL queries.
AMP
User designs a graph by specifying filter and accumulators.
User defined critics.
LogWeb
User designs application by graphically drawing dataflow.

How to do this?

  • External component - SQL

  • Compiler or interpreter - too much work?

  • Closure compiler - build behavior out of executable objects.
02 | Previous | Up | Next

FOOP
Functional Object Oriented Programming

We use a stripped down Java like syntax for a functional language:

  • No ";" - No statements,only expressions. if become an expression.

  • No "return" - A method or block returns the value of it's last expression.

  • No "," - Just us a space.

  • No type declarations - Types are inferred by the compiler - statically typed.

  • A Function is defined with method syntax: ( ... ) { ... }:
      f = (x) { x + 3 }  // Anonymous function bound to variable "f".
      f(x) { x + 3 }     // a function called "f".
    

  • Multimethods - versions of a function that only works for particular argument types (checked at runtime).
      g((String) x (String) y) { x + y }  // Version for strings.
      g(   (int) x    (int) y) { x + y }  // Version for int's.
    

  • A function becomes a class with a singleton instance as a static field.

  • The function/class has one operation, "_" that implements the function's behavior.

  • A function's type signature becomes an interface with a mangled name.
03 | Previous | Up | Next

A functional object captures behavior

The variable g has a value that is a function of two arguments, r and x, that computes r*x*(1-x):

g = (r x) { r*x*(1.0f - x) } // Chaos!

A function is an object with one operation, application of the function to some arguments:

g(3.5 0.5) 
-> 0.875

A function can return a function as its value

h = (r) { (x) { g(r x) } }

h(3.5)(0.5)
-> 0.875
04 | Previous | Up | Next

Java implementation
function == object == class == transparent

interface f_f$    { public float _(float a); }
interface f_ff$   { public float _(float a, float b); }
interface f_f$_f$ { public f_f$  _(float x); }


public class G implements f_ff$ {
  public final static f_ff$ g = new G();
  public float _(float r, float x) { return r*x*(1.0f - x); }
}

public class H implements f_f$_f$ {
	public float r;
	public final static f_f$_f$ h = new H();
	public f_f$ _(float r) { return (f_f$) new H0(r); }
}

public class H0 implements f_f$ {
	public float r;
	public H0(float r) { this.r = r; }
	public float _(float x) { return G.g._(this.r, x); }
	}

// h(3.5)(0.5) looks like:
H.h._(3.5)._(0.5)

05 | Previous | Up | Next

Terminology

In

(x) { g(r x) }

(x) { g(r x) } is an anonymous function.

x is an argument or bound variable.

g and r are free variables.

If we look at the surrounding lexical context, for example:

h = (r) { (x) { g(r x) } }
we can conclude that:

g is a global variable.

r is a lexical variable. It's value comes from the surrounding lexical context, the definition of h(r).

g(r x) is the application of g to its arguments r and x.

h(3.5) returns a closure, a function that captures (or closes over) the value of r.

A closure can be used as a callback.

06 | Previous | Up | Next

Spot Quiz

In a language of your choice, consider the task of writing a function, sum_odd, that sums the odd integers in a list [Java Example] [Lisp Example].

Your task it to write sum_odd several different ways:

  1. without using goto's [Solution].
  2. assigning to variables only once [Solution].
  3. using no local variables, only constants, global variables or method names [Solution].
  4. without using return [Solution].
  5. using only functions (or methods) with 1 argument [Solution].
  6. without using if. Hint: redefine true and false appropriately [Solution].
  7. all of the above [Solution].

When i started programming, in 1965, i did not believe i could live without any of these.

Only last year did i learn i could live without such things.

Ray Tomlinson's Java solution.

Dave Mankin's Perl solution.

07 | Previous | Up | Next

Obscene?

English: "sum the odd elements of a list".

// Java
class Puzzler_0 extends Object {
        public static int sum_odd(Cons es) {
                int sum = 0;
                while(!Cons.isNull(es)) {
                        int e = es.car();
                        if (e%2 == 1) sum = sum + e;
                        es = es.cdr();
                        }
                return sum;
  }

The Java version is correct and optimum in some sense.

However,

  • it requires 10 lines, rather than one.
  • No reuse - a slightly different problem requires completely new code.
08 | Previous | Up | Next

Sum odd elements

  • We'd like to program this a simply as: "Sum odd elements".
  • We'd like to have reusable components.
car   = (x)   { x.car() }      // List primitives
cdr   = (x)   { x.cdr() }
cons  = (x y) { new Cons(x, y) }

isOdd = (i)   { i%2 == 0 }  // Predicate
// Filter a list.
filter = (p es) { 
  if (es == null) null
  else { e = car(es)
         if (p(e)) cons(e filter(p cdr(es)))
        }
}
// Sum the elements.
sum = (es) { 
  if (es == null) 0
  else car(es) + sum(cdr(es))
}
sum_odd = (es) { sum(filter(isOdd es)) }

Reusable components.

Unnecessary work (temporary list) we'll remove later.

09 | Previous | Up | Next

Currying captures state and behavior

filt = (p) { (es) { filter(p es) } }

filt is a Curryed version of filter.

filt(p)(es) == filter(p es)

filt(p) is a function waiting to be applied to a list.

After Haskell B. Curry (1958), though perhaps we should use the term Shonfinkled after M. Shonfinkle (1924).

We can write a function that will curry a function for us:

curry  = (f) { (a) { (b) { f(a b) } }

rcurry = (f) { (b) { (a) { f(a b) } }  // Curry right.

filt = curry(filter)

odd = filt(isOdd)

// odd is equivalent to:
odd = (es) { 
  if (es == null) null
  else { e = car(e)
         if (isOdd(e)) cons(e odd(cdr(es)))
        }
}
10 | Previous | Up | Next

The power of currying

// Opening a unix file:
int mode;
String file;
int fd = open(file, mode);

// Reading:
byte[] buffer;
int size;
int result;
result =  read(fd, buffer, size);  // ~ 500 lines of code.

// Curried file reading - used in Synthesis Kernel.

Reader r = open(file, mode)       // Optimized reader class.

result =  r.read(buffer, size);   // ~ 30 instructions.


11 | Previous | Up | Next

Where do predicates come from?

Predicates come from functional composition and currying!

compose = (f g) { (x) { f(g(x) } }

isOdd = (i) { 0 == i%2  }

isOdd = compose(curry(==)(0) rcurry(%)(2))

sum_odd = compose(sum (curry(filter)(isOdd)))

This satifies Puzzler 3 - no variables, only global constants.

Behaviors are reusable.

12 | Previous | Up | Next

Summary

A function is an object with one behavior, application of itself to its arguments.

A function that returns another function is a simple compiler or factory.

A function can construct new behavior from old by:

  • Currying - captures variables (state).

  • Currying - captures partial computation - late binding.

  • Composition - combines behavior.

? How many composition operations do we need? - 2

13 | Previous | Up | Next

Compiler 1 theory - Even simpler notation

[f x] // means applying function f to argument x.

2 argument functions are automatically curryed:

[+ 3 5] -> 8

equlvalent to:

[[+ 3] 5] -> 8  // [] is left associative.

[g r x ] = [* r [* x [- 1.0f x]]]

h = [g 3.5]

[h 0.5] -> 0.875

[fac n] = [if [<= n 0] 1 [* n [fac [- n 1]]]]
14 | Previous | Up | Next

Abstraction is the opposite of application

Say we have a function:

[f x] = [+ 3 x]

Because of automatic currying, f is the same as:

f = [+ 3]  // f can be defined without arguments!

[g x] = [* x x]  // Can we define g without arguments?

[W f x] = [f x x] // Warbler - duplicate argument.

g = [W *]

[g 3] 
[[W *] 3]
[W * 3]
[* 3 3]
9

By using W we removed, or abstracted the argument x from g. Thus abstraction lets us "compile" a function into a form that contains only application of global constants (threaded byte codes).

15 | Previous | Up | Next

SK combinators

It turns out there are only two byte code we need, S and K.

Traditionally, the byte codes are called combinators.

A conbinator is a function with no free variables.

S and K are defined as.


[S f g x] = [f x [g x]]  // Starling - 2 sided composition.
[K x y] = x              // Kestrel - Constant

[I x] = x                // Identity == Fondness
I = [S K K]

[I x]
[[S K K] x]
[S K K x]
[K x [K x]]
x

[W f x] = [f x x] = [S f I] // Warbler
16 | Previous | Up | Next

Other useful combinators

It is useful to define other combinators, each of which can be defined by S and K.

I x        -> x, I == S K K   Identity
K x y      -> x               Kestrel
S f g x    -> f x (g x)       Starling
B f g x    -> f (g x)         Bluebird = curry
C f g x    -> f x g           Cardinal = curry second arg first.
T x f      -> f x == C I x f  Thrush
W f x      -> f x x == S f I x Warbler
M x        -> x x == S I I x   Mocking Bird
S1 c f g x -> c (f x) (g x)
B1 c f g x -> c (f (g x))
C1 c f x   -> c (f x) g
17 | Previous | Up | Next

Abstraction algorithm

One can abstract the variable, x, from the function, f using the following agorithm.


abstract(f, x) {
  match f {
    [f1 f2] -> Opt([S abstract(f1 x) abstract(f2 x)])
          x -> I
          ? -> [K x]
  }
}

Opt is an optimization phase, which we'll ignore for now. Here's an example:

f = [+ x 3]

abstract(f, x) = abstract([+ x 3], x) =
abstract([[+ x] 3], x) =
[S abstract([+ x], x) [K 3]] =
[S [S [K +] I] [K 3]]

[f 5]
[S [S [K +] I] [K 3] 5]
[S [K +] I 5 [K 3 5]]
[S [K +] I 5 3]
[K + 5 [I 5] 3]
[K + 5 5 3]
[+ 5 3]
8
18 | Previous | Up | Next

Optimization rules

Compile time optimizations let us do much less work:

[S [K p] [K q]]   == [K [p q]]
[S [K p] I]       == p
[S [K p] [B q r]] == [B1 p q r]
[S [K p] q]       == [B p q]
[S p     I]       == [W p]
[S [B p q] [K r]] == [C1 p q r]
[S p [K q]]       == [C p q]
[S [B p q] r]     == [S1 p q r]
[C I q]           == [T q]

Now we get:

[f x] = [+ x 3]
f = [C + 3]

[f 5] =
[[C + 3] 5]
[C + 3 5]    // C f g x    -> f x g
[+ 5 3]
8
19 | Previous | Up | Next

Abstracting multiple arguments

For multiple argument functions, abstract each argument in turn:

[g r x ] = [* r [* x [- 1.0f x]]]

abstract(abstract([g r x], r), x) =
[C [B B *] [S * [- 1.0]]] 

Abstracting the other way around essentally reverse the arguments, producing a different function:

abstract(abstract([g r x], x), r) =
[B [C *] [S * [- 1.0]]] 
20 | Previous | Up | Next

Compiler 1 - SASL

David Turner used this approach to make a simple but useful functional language. It was fully lazy, never computing something before it was necessary. Because it was functional, results of partial applications could be cached, so constants were automatically pulled out of loops.

Here's a 200 line Lisp version. It exposes a Scheme like language which is converted to combinator form by currying and abstraction using the following rules:

'x                           == x
(lambda (x) e)               == abstract(curry(e) x)
(lambda (x ...) e)           == curry((lambda (x) (lambda (...) e)))
(if cond them else)          == [if curry(cond) curry(then) curry(else)]
(let ((arg val) ...) e)      == curry(((lambda (arg ...) e) val ...))
(flet (((f args body) ...) e) == curry((let ((f (lambda args body)) ...) e))
(f x)                        == [curry(f) (curry(x)]
(f x ...)                    == curry(([f x] ...))
([f x] y ...)                == curry(([f x y] ...))
c                            == c; if c is a constant

Example:

(def (g r x) (* r (* x (- 1.0 x))))

[C [B B *] [S * [- 1.0]]] 
21 | Previous | Up | Next

Example: Prime Seive

(def (from by i j) 
  (if (< i j) (cons i (from by (by i) j))
      nil))
[C [B C [B [B C] [B [S [B S [B [B IF] <]]] [B [S [B B CONS]] 
[W [B B FROM]]]]]] 

(def (null x) (eql nil x))
[EQL NIL]

(def (filter p xs) 
  (if (null xs) nil
      (let ((x (car xs)))
        (if (= (mod x p) 0) (filter p (cdr xs))
            (cons x (filter p (cdr xs)))))))
[B [S [C [B IF NULL] NIL]] [C [B S [S [B S [B [B S] 
[S [B B [B C [B [B IF] [C [B C [B [B =] [C MOD]]] 0]]]]
 [C [B B FILTER] CDR]]]] [B [B [C CONS]] [C [B B FILTER] CDR]]]] CAR]] 

(def (seive xs)
  (if (null xs) nil
      (let ((it (car xs)))
        (cons it (seive (filter it (cdr xs)))))))
[S [C [B IF NULL] NIL] [S [B [S CONS] [B [B SEIVE] [B [C FILTER]
 CDR]]] CAR]] 
(seive 2 1 1000)
22 | Previous | Up | Next

Compiling to byte code

Graph reduction               | data stack                Emitted code
[g r x] = 
[C [B B *] [S * [- 1.0]] r x] | r x
[B B * r [S * [- 1.0]]  x]    | r r x                     [dup 1]
[B [* r] [S * [- 1.0]]  x]    | r r x
[B [* r] [S * [- 1.0]]  x]    | r r x
[* r [S * [- 1.0]  x]]]       | r r x
[* r [* x [- 1.0 x]]          | x x r r x                 [dup 3] [dup 1]
[* r [* x (- 1.0 x)]          | (- 1.0 x) x r r x         [const 1.0] [-]
[* r (* x (- 1.0 x))]         | (* x (- 1.0 x)) r r x     [rot 1] [*]
(* r (* x (- 1.0 x)))         | (* r (* x (- 1.0 x))) r x [rot 1] [*]
                              | (* r (* x (- 1.0 x)))     [slide 2]

? (def (h r x) (+ (+ x (* r x)) r))

[[W [B C [B [B +] [B [S +] *]]] r x] | r x
[B C [B [B +] [B [S +] *]] r r x]    | r r x  [dup 1]
[C [[B [B +] [B [S +] *]] r] r x]    | r x r  [rot 3 2]
[B [B +] [B [S +] *] r x r]
[B + [B [S +] * r] x r]
[+ [B [S +] * r x] r]
[+ [S + [* r] x] r]                  | x r x r [dup 2]
[+ [+ x (* r x)] r]                    
[+ (+ x (* r x)) r]                
(+ (+ x (* r x)) r)

23 | Previous | Up | Next

Puzzler 6: without using if

Generally, booleans are used in conjunction with control statements, for example

if boolean then else

Thus we can define true and false as two argument functions:

[true then else] = then
true = K

[false then else] = else
false = [K I]

Comparison primitives like "==" or ">" now return true or false which are applied to the branches of if.

Power:

  • Boolean is the dual of control.
24 | Previous | Up | Next

Fully CPS version

Here is a version of sum_odd written in continuation passing style without if.

knull, kcar, koddp and k+ are continuation passing versions of their respective primitives.

Each primitive operation computes it's value and jumps to a return address with it's value in a register.

(def (sum-odd-2 elements sum k)
  (knull elements                       
         (lambda (is-null)              ; goto + argument
           (is-null (k sum)
                    (kcdr elements
                          (lambda (rest)
                            (let ((j (lambda (new-sum)
                                       (sum-odd-2 rest new-sum k))))
                              (kcar elements
                                    (lambda (e)
                                      (koddp e
                                             (lambda (is-odd)
                                               (is-odd
                                                (k+ sum e j)
                                                (j sum)))))))))))))

25 | Previous | Up | Next

Puzzler 7: All of the above

Here is what it looks like after the arguments have been abstracted out. Not pretty, but it satisfies all the requiremnts of the puzzle.

(S (B B (B B KNULL))
   (B (S (B S (B (B C) (B (B THRUSH) THRUSH)))) 
      (S (B B (B B KCDR))
         (C (B C 
               (B (B B)
                  (B (B B) 
                     (C (B B (B B KCAR)) 
                        (B (B (S KODDP))
                           (S (B S 
                                 (B (B C)
                                    (B (B (B C))
                                       (B (B (B THRUSH)) 
                                          (B C K+)))))
                              THRUSH)))))) 
            (C (B C SUM-ODD-2))))))

(B B f x g y) == (B (f x) g y) == (f x (g y))  ; Apply (g y) first.
(B C g x z y) = (C (g x) z y) = (g x y z)      ; Reverse z and y.
26 | Previous | Up | Next

Compiler 2 - Feeley's closure compiler

Marc Feeley shows how to use closures to compile a functional subset of Scheme.

Closures can be simulated by objects when they are not directly available in the language.

The performance of a closure compiler lies between that of an interpreter and compiler depdending on the effort devoted to specializing the generated closures.

Optimization and code generation are done in one phase rather than several as would be done in a real compiler.

Compiling can be quick, but writing a highly optimized compiler can be straight forward but tedious.

27 | Previous | Up | Next

Example Performance

Here is a simple example of a compiler for simple numerical expressions involving numeric constants and variables.
WhatTimeLines Relative slowness Classes
Lisp compiled 2.107 2 1.00 1
Lisp eval 32.05 2 15.21 ?
Special eval 7.057 7 3.35 3
compiler-1 7.462 17 3.54 5
compiler-2 5.006 49 2.38 11

It is possible to get within a factor of 2.4 of compiled code.

Faster takes more code (trading code space for time).

Example: Used in Spyglass and AMP to build predicates and accessors where the total code for the compiler is 2 pages of Lisp.

28 | Previous | Up | Next

Compiled example

4 argument function for evaluating a quadratic.

Compiled by Common Lisp.

(compute-it 4 3 2 1)

(defun compute-it (x a b c)
  ;; 2.1071048
  (+ (* x (+ (* a x) b)) c))
29 | Previous | Up | Next

Standard evaluator

The standard Lisp evaluator must be ready to handle the entire language.

If a Lisp implemntation provides a good compiler, there is less reason to expect eval to be fast.

The s-expression must be constructed before it can be evaluated.

(defun eval-it (x a b c)
  ;; 32.053104
  (eval `(+ (* ,x (+ (* ,a ,x) ,b)) ,c)))
30 | Previous | Up | Next

Specialized eval

Specialize eval for your mini language.

Twice as fast because it has less to worry about.

Replace

(eval expression)
by
(eval-it expression env)

env is a list that associates a variable's name to its value.

(defun eval-it (exp env)
  7.057116
  (cond ((consp exp)
	 (case (car exp)
	   ((+) (+ (eval-it (second exp) env) (eval-it (third exp) env)))
	   ((*) (* (eval-it (second exp) env) (eval-it (third exp) env)))))
	((symbolp exp) (cdr (assoc exp env)))
	(t exp)))
	      
(eval-it 
 '(+ (* x (+ (* a x) b)) c)
 '((x . 4) (a . 3) (b . 2) (c . 1)))

Boost performance by unrolling the recursion several times.

31 | Previous | Up | Next

First closure compiler

So far, the environment has been represented by a data structure that has static (variable names) and dynamic (variable values) components mixed together. We can separate them so that

(eval-it expression env)
is replaced by
(funcall (compile-it expression (compile-time-env env))
         (runtime-env env))
So the resulting call looks like:
(funcall (compile-it  '(+ (* x (+ (* a x) b)) c)
		      '(x a b c))
	 #(4 3 2 1))

Each clause of the interpreter is replaced by one returning a closure.

Each recursive call to eval-it is replaced by a call to compile-it.

(defun compile-it (exp env)
  ;; 7.462443
  (cond ((consp exp)
	 (case (car exp)
	   ((+) (let ((a (compile-it (second exp) env))
		      (b (compile-it (third exp) env)))
		  (lambda (env)
		    (+ (funcall a env) (funcall b env)))))
	   ((*) (let ((a (compile-it (second exp) env))
		      (b (compile-it (third exp) env)))
		  (lambda (env)
		    (* (funcall a env) (funcall b env)))))))
	((symbolp exp) 
	 (let ((i (position exp env)))
	   (lambda (env) (svref env i))))
	(t (lambda (env) exp))))
This compiler is slightly slower than the specialized evaluator.
32 | Previous | Up | Next

Optimizing closure compiler

Check more special cases, like a constant arguments.

Use declarations to minimize overhead.

(defun compile-it (exp env)
  ;; 5.0063405
  (cond ((consp exp)
	 (case (car exp)
	   ((+)(gen-+ (second exp) (third exp) env))
	   ((*)(gen-* (second exp) (third exp) env))))
	((symbolp exp) (gen-lexical (position exp env)))
	(t (gen-constant exp))))

(defmacro %lambda (args &body body)
  `(lambda ,args (declare (optimize (speed 3) (safety 0))) ,@body)) 

(defmacro %funcall (f &rest args)
  `(funcall (the compiled-function ,f) ,@args))

(defmethod gen-+ ((a symbol) (b symbol) env)
  (let ((a (position a env))
	(b (position b env)))
    (%lambda (env) (+ (svref env a) (svref env b)))))

(defmethod gen-+ ((a symbol) b env)
  (let ((a (position a env))
	(b (compile-it b env)))
    (%lambda (env) (+ (svref env a) (%funcall b env)))))

(defmethod gen-+ (a (b symbol) env)
  (let ((a (compile-it a env))
	(b (position b env)))
    (%lambda (env) (+ (%funcall a env) (svref env b)))))

(defmethod gen-+ (a b env)
  (let ((a (compile-it a env))
	(b (compile-it b env)))
    (%lambda (env) (+ (%funcall a env) (%funcall b env)))))

(defmethod gen-* ((a symbol) (b symbol) env)
  (let ((a (position a env))
	(b (position b env)))
    (%lambda (env) (* (svref env a) (svref env b)))))

(defmethod gen-* ((a symbol) b env)
  (let ((a (position a env))
	(b (compile-it b env)))
    (%lambda (env) (* (svref env a) (%funcall b env)))))

(defmethod gen-* (a (b symbol) env)
  (let ((a (compile-it a env))
	(b (position b env)))
    (%lambda (env) (* (%funcall a env) (svref env b)))))

(defmethod gen-* (a b env)
  (let ((a (compile-it a env))
	(b (compile-it b env)))
    (%lambda (env) (* (%funcall a env) (%funcall b env)))))

(defun gen-lexical (i) (%lambda (env) (svref env i)))

(defun gen-constant (c) (%lambda (env) c))
33 | Previous | Up | Next

Compiler 3 - Query Optimization

Put it all together.

Typical of compiler transformations for a hight level language.

Imagine a simple database with two tables (relations):

class Ob { ...}        // Superclass of all domain objects.

class Tuple extends Ob { ... }
class Employee extends Tuple { 
  public Ob employeeNumber, salary, dept, status; 
}

class Paper extends Tuple { 
  public Ob employeeNumber, title, year; 
}

// Accessors
employeeNumber((Paper) p) = { p.employeeNumber } // ...

34 | Previous | Up | Next

Example query

Suppose we are interested in aswering a query like:

Find the names of all professors who published papers after 1980.
In SQL this might look like:
select e.name from emp e, paper p 
where e.status = 'PROF' 
and p.employeeNumber == e.employeenumber
and p.year >= 1980;
35 | Previous | Up | Next

Simple query language

using the following query language:

(scan table) - convert table into stream of tuples.

(filter predicate stream1) - stream filtered by predicate.

(project attributes stream) - project stream onto attribute list.

(join predicate stream1 stream2) - Nested loop join.  
  Tuples t1 and t2 from both streams are concatenated if
  predicate(t1 t2) is true.

The query would then look something like:

(project (name)
         (join employeeNumber 
               (filter(== status prof) (scan Employee))
               (filter (>= year   1980) (scan Paper))))

Such an expression could be constructed by a user of a graphical interface, say.

36 | Previous | Up | Next

What we'd like

We'd like this to be compiled into something like:

Cons query(Enumeration emp, Enumeration paper) {
  Cons result = null;
  while {emp.hasMoreElements()) {
    Employee t6 = emp.nextElement();
      if (t6.status() == Status.professor) {
        while (paper.hasMoreElements() {
          Paper t4 = paper.nextElement();
          if (t4.year() >= 1980)
            if (t6.employeeNumber() == t4.employeeNumber()) {
              Tuple v = new Tuple();
              v.addElement(t6.name);
              result = Cons.cons(v, result);
        }
  }
  return(result);
}
37 | Previous | Up | Next

Cookie Cutter Approach

Ob query(Enumeration emp, Enumeration paper,
         Ob init, P1 p1, P1 p2, P2 p3, Acc add) {
  Ob result = init;
  while {emp.hasMoreElements()) {
    Employee t6 = emp.nextElement();
      if (p1._(t6)) {
        while (paper.hasMoreElements() {
          Paper t4 = paper.nextElement();
          if (p2._(t4))
            if (p3._(t6, t4))
              result = add._(t6, result);
        }
  }
  return(result);
}

+ Generalized code.

+ Small additional overhead - 1 to 3 invokevirtual's in inner loop.

+ Proven approach - used in Synthesis Kernel.

- Cutters are hand made.

- Need many cutters - keep up with user demand.

- Can't cookie cut predicates - too many.

38 | Previous | Up | Next

Construct predicates

Use currying and composition to go from (== status prof) to a predicate like isProf.

 ||#

joinable = (f b) {
  fb = f(b)                      // Pull constant out of loop.
  (a) {f(a).equals(fb) }
}

EQ((Ob) x (Ob) y) { x.equals(y) }    // Primitive operations ...
NE((Ob) x (Ob) y) { x.notEquals(y) } Overhead: O(NE) = 1*M
// ... 

// O(rcurry(f x))  = 2*F + 1*M + O(f)
rcurry(f x) { (y) (f(y x)) }    

// O(compose(f g)) = 2*F + 1*M + O(f) + O(g)
compose(f g) { (x) { f(g(x)) }  

isProf = compose(rcurry(EQ)(prof) status)
yearGE1980 = compose(rcurry(GE)(1980) year)
// O(yearGE1980) = 4*F + 4*M

Overhead is 4 times what we thought!

39 | Previous | Up | Next

Smarter composition

Specialize rcurry() and compose() for each primitive operation.


rcurry(f x) { (y) (f(y x)) }    // General case.
rcurry((EQ) f x) { REQ (y) { y.equals(x) } }  // Specialized cases
rcurry((NE) f x) { RNE (y) { y.notEquals(x) } }  // O(RNE) = 1*F + 1*M
rcurry((LE) f x) { RLE (y) { y.lessThanOrEqual(x) } }
rcurry((GE) f x) { RGE (y) { y.greaterThanOrEqual(x) } }
rcurry((LT) f x) { RLT (y) { y.lessThan(x) } }
rcurry((GT) f x) { RGT (y) { y.greaterThan(x) } }

compose(f g) { (x) { f(g(x)) }  // General case
compose((REQ) f g) {            // Specialized cases ...
  x = f.x()
  (y) { g(y).equals(x) }        // O(CREQ) = 2*F + 1*M + O(g)
}
// ...

yearGE180 = compose(rcurry(GE 1980) year) 
// O(yearGE180) = 3*F + 2*M

Overhead reduced almost in half.

40 | Previous | Up | Next

Cup O' Loop

Any mapping function fits the following cookie cutter:

foldL = (done next rest acc when what sofar es) {
  if (done(es)) sofar
  e = next(es)
  foldL(done next rest acc when what 
           if (when(e)) acc(what(e) sofar)
           else sofar
        rest(es))
}

fold(done next rest acc when what sofar) {
  (es) { foldL(dones next rest acc when what sofar es) }
}

truthhood = ((Ob) x) { true }
identity = ((Ob) x) { x }

scan = 
  fold(notHasMoreElements nextElement truehood cons identity 
       truehood null)

filter(p) { fold(isNull car cdr cons p identity null) }

map(what) { fold(isNull car cdr cons truthhood what null) }

project(accessors) { map(projector(accessors)) }
41 | Previous | Up | Next

Join == Nested folds

 
//* Like foldL but when() and what() take 2 arguments.
jfoldL = (done next rest acc when what sofar es x) {
  if (done(es)) sofar
  e = next(es)
  jfoldL(done next rest acc when what 
         if (when(x e)) acc(what(x e) sofar)
         else sofar
        rest(es))
}

jfold = (done next rest acc when what es) {
  (x sofar) {
    jfoldL(done next rest acc when what es x sofar) {  
}}

join(p g f) {
   when = composeWhen(f.when compose(joinable(p) f.what))
   what = tupleJoin
  joinInternal(g what when)
}

joinInternal(g what when)
  (b) {
  // Inner Loop.
  accum = jfold(isNull car cdr cons when what b)
  }
  // Outer Loop.
  compose(fold(isNull car cdr accum truehood identity null) g) }
}
42 | Previous | Up | Next

Building the query function

A function that will execute the query can be constructed by:

query = compose(project(cons(name null))
                join(employeeNumber
                     compose(filter(statusEQprof) scan)
                     compose(filter(yearGE1980) scan)))

To execute the query, just provide arguments:

query(employee)(paper)
43 | Previous | Up | Next

Loop merging optimizations

compose((map) f (map g)) { map(compose(f.what g.what)) }

compose((fold) f (fold g)) {
  fold(g.done g.next g.rest f.acc 
       composeWhen(f.when g.when g.what)
       compose(f.what g.what)
       f.sofar)
}

compose((filter) f (scan) g) {
  fold(notHasMoreElements nextEelement identity cons f.p 
       identity null)
}

compose((map) f (filter) g) {
  fold(isNull car cdr cons f.p g.what null)
}

compose((map) m (join) j) {
  when = j.when
  what = compose(m.what j.what)
  joinInternal(j.g when what)
}
composeWhen(whenf wheng whatg) {
  (e) { wheng(e) && whenf(whatg(e)) }
}

composeWhen(whenf wheng (Identity) whatg) {
  (e) { wheng(e) && whenf(e) }
}

composeWhen((Truehood whenf) wheng whatg) { wheng }
// Similarly for other special cases ...

44 | Previous | Up | Next

Composing multiple argument functions

Functions with multiple arguments can be composed.

Can't write a function that writes this code.

You must write it for each case.


h(a b) { ... }
g(c d) { ... }
compose(h g) -> (c d b) { h(g(c d) b) }
45 | Previous | Up | Next

What is going on?

Composition rules move code fragments around for effective use.

query = compose(project(cons(name null))
                join(employeeNumber
                     compose(filter(statusEQprof) scan)
                     compose(filter(yearGE1980) scan)))

p = compose(filter(statusEQprof) scan)
  = fold(notHasMoreElements nextEelement identity cons 
         statusEQprof identity null)

t = compose(filter(yearGE1980) scan)
  = fold(notHasMoreElements nextEelement identity cons 
         yearGE1980 identity null)

j = join(employeeNumber p t)

j = {
  when = composeWhen(yearGE1980 compose(joinable(employeeNumber) 
                                        identity))
  what = tupleJoin
  (b) {
  accum = jfold(isNull car cdr cons when what b)
  }
  compose(fold(isNull car cdr accum truehood identity null) p) }
}

j = {
  when = composeWhen(yearGE1980 joinable(employeeNumber))
  what = tupleJoin
  (b) {
  accum = jfold(isNull car cdr cons when what b)
  }
  fold(notHasMoreElements nextEelement identity accum statusEQprof 
       identity null)
}

m = project(cons(name null))
q = projector(cons(name null)))
m = map(q)

query = compose(m j)

query = {
  when = composeWhen(yearGE1980 joinable(employeeNumber))
  what = compose(q tupleJoin)
  (b) {
  accum = jfold(isNull car cdr cons when what b)
  }
  fold(notHasMoreElements nextEelement identity accum statusEQprof 
       identity null)
}

46 | Previous | Up | Next

Reducing runtime composition

Nested loops as in a join require a function to be constructed that captures arguments (each professor in this example).

This function is constructed by composition.

The same operations will be done each time because the classes are constant.

Specialized classes like jfold reduce composition overhead.

47 | Previous | Up | Next

Go directly to compiled code

With about the same amount of work, maybe less, one can produce compilable code.

Here is the Lisp equivalent of the earlier hand optimized Java version produced by this code.

(LET ((RESULT 'NIL))
  (DOLIST (T6 EMP)
    (WHEN (STATUS==PROF T6)
      (DOLIST (T4 paper)
        (WHEN (YEAR>=1980 T4)
          (WHEN (SAME-EMP\# T6 T4)
            (SETQ RESULT (CONS (NAME-1 (CONS T6 T4)) RESULT)))))))
  RESULT)
48 | Previous | Up | Next

Summary

+ You don't need much to compute - f(x), S, and K.

+ A function object is a piece of behavior.

+ Function application is glue.

+ Currying captures state - new Object

+ Currying is partial computation - late binding.

+ Object composition is optimization - factory pattern.

- Classes and composition rules grow exponentially.

- Narrow compiler focus to be effective.

49 | Previous | Up | Next

References

Backus, J. Can programming be liberated from the von Neumann style? A functional style and its algebra of programs, CACM, 21, 8 (Aug. 1978), 613-641.

Codd, E.F., A relational model of data for large shared data banks, CACM 13, 6 (June 1970), 337-387.

Burstall, R.M. and Darlington, J. A transformation system for developing recursive programs, J. ACM 24,1 (Jan. 1977), 44-67.

Curry, H.B., Feys Robert, and William Craig, Combinatory Logic, North-Holland Publishing Co, Amsterdam, 1968.

Freytag, J.C. and Goodman N., On the translation of relational queries into iterative programs, ACM Trans. Database Sys., Vol 14, No. 1, March 1989, p. 1-27.

Friedman, D.P. and Wise, D.S. CONS should not evalutate its arguments, In Automata, Languages, and Programming, S. Michaelson and R. Milner, eds., Edinburg University Press, Edinburgh, 1976, 257-284.

Simon L. Peyton Jones, The Implemntation of Functional Programming Languages, Prentice-Hall, 1987.

Raymond Smullyan, To Mock a Mockingbird, Alfred A. Knopf, New York, 1985.

D.A. Turner, A new implementation technique for applicative languages, Software - Practice and Experience, vol 9, 31-49, 1979.

50 | Previous | Up | Next

Next time: Freeing the essence of computation