#|| Relational Query Optimizer

Relational Query Optimizer

We will how show how we can put together everything we've learned to produce an efficient implementation of an abstract description of a problem we want to solve. We will focus on a query optimization example that requires only a few transformations. However, those transformations are typical of those done when compiling high level languages.

Imagine a simple database with two tables (relations):

(deftable emp (emp# name salary dept status))
(deftable paper (emp# title year))

Suppose we are interested in aswering a query like:

"Find the names of all professors who published papers after 1980"

In SQL:
select e.name from emp e, paper p 
where e.status = "prof" and p.emp# == e.emp# and p.year >= 1980;

using the following query language:

(scan table) - convert table into stream of tuples.

(filter predicate stream1) - stream filtered by predicate.

(project attributes stream) - project stream onto list of attributes.

(join predicate stream1 stream2) - Nested loop join.  
  Tuples t1 and t2 from both streams are concatenated if
  (predicate t1 t2) is true.

Q1: User query expression:

The query would then look something like:

(project (name)
         (join emp# 
               (filter (== status 'prof) (scan emp))
               (filter (>= year   1980)  (scan paper))))

Such an expression could be constructed by a user of a graphical interface, say.


Construct predicates

We will need to go from expressions like (== status 'prof) to a predicate like (lambda (x) (== (status x) 'prof)). We'll assume this has already been done for us, perhaps using the closure compiling technique.

 ||#

(define (name-1 a)         (vector (name (car a))))
(define (same-emp# a b)    (== (emp# a) (emp# b)))
(define (status==prof a)   (== (status a) 'prof))
(define (year>=1980 a)     (>= (year a) 1980))

#||

Q2: Resulting query:

Given these predicates, the query looks like:

||#

(setq q2
      '(project name-1
         (join same-emp# 
               (filter status==prof (scan emp))
               (filter year>=1980  (scan paper)))))

#|| 

Q3: MAP* transformation

(MAP* f s) is a generalized mapping function where s is a stream and f is a function or a stream:

(map* #'(lambda (a) (* a a)) '(1 2 3 ...))
-> (1 4 9 ...)
When f is a stream of functions, the result is a stream of curried functions:
(map* (list #'+ #'*) '(1 2 ...))
-> (list (lambda (b) (+ 1 b))
         (lambda (b) (* 1 b))
         (lambda (b) (+ 2 b))
         (lambda (b) (* 2 b))
         ...))
The query language can be transformed into a MAP* expression:
(scan table)    == (map* id table)
(filter p s)    == (map* (pred p) s)
(project p s)   == (map* p s)
(join p s1 s2)  == (map* (map* (KCjoin p) s2) s1)
Where id is the identity function, and pred and KCjoin produce other functions. They can be defined as follows, though we will not actually use these definitions.
(define (id a) a)
(define ((pred p) a) (when (call p a) a))
(define ((KCjoin p) t2 t1) (when (call p t1 t2) (cons t1 t2)))
qep->map* does the transformation. It uses case-match a pattern matching macro from Paul Graham's book "On Lisp".
 ||#
(defun qep->map* (e)
  (case-match
   e
   (('scan ?table)     `(map* id ,?table))
   (('filter ?p ?s)    `(map* (pred ,?p) ,(qep->map* ?s)))
   (('project ?p ?s)   `(map* ,?p ,(qep->map* ?s)))
   (('join ?p ?s1 ?s2) `(map* (map* (KCjoin ,?p) ,(qep->map* ?s2)) 
                              ,(qep->map* ?s1)))))

(pprint (setq q3 (qep->map* q2)))

'(MAP* NAME-1
  (MAP* (MAP* (KCJOIN SAME-EMP\#)
              (MAP* (PRED YEAR>=1980) (MAP* ID paper)))
        (MAP* (PRED STATUS==PROF) (MAP* ID EMP))))

#|| 

MAP* simplification rules

We can use some simplification rules to reduce the number of map* expressions by composing functions with the function L:
f1 = (lambda (a1 a2 ...) ...)
f2 = (lambda (b1 b2 ...) ...)

(L f1 f2) -> (lambda (b1 b2 ... a2 a3 ...) (f1 (f2 b1 b2 ...) a2 a3 ...))
(L f1 id) -> f1
(L id f2) -> f2

So if f1 and f2 have n1 and n2 arguments, respectively, L returns a function with n1+n2-1 arguments.

(L name-1 (KCjoin same-emp#)) ->
(L name-1 (clambda (b a) (when (same-emp# a b) (cons a b)))) ->
(clambda (b a) (name-1 (when (same-emp# a b) (cons a b))))

The second two rules are simply specializations of the first that deal with the identity function, id

The map* transformations are:

(map* f1 (map* f2 s)) -> (map* (L f1 f2) str)

(L f1 (L f2 f3))      -> (L (L f1 f2) f3)

(L f1 (map* f2 s))    -> (map* (L f1 f2) s)
The first rule is vertical loop fusion. The second and third rule combine functions together into larger functions.

The code for these transformations looks like:

 ||#

(defun simplify-map* (e)
  (case-match e
   (('map* ?f1 ('map* ?f2 ?s))
    (let ((f1 (simplify-map* ?f1))
          (f2 (simplify-map* ?f2))
          (s (simplify-map* ?s)))
      (simplify-map* `(map* (L ,f1 ,f2) ,s))))
   (('L ?f1 ('L ?f2 ?f3))
    (let ((f1 (simplify-map* ?f1))
          (f2 (simplify-map* ?f2))
          (f3 (simplify-map* ?f3)))
      (simplify-map* `(L (L ,f1 ,f2) ,f3))))
   (('L ?f1 'id) (simplify-map* ?f1))
   (('L 'id ?f2) (simplify-map* ?f2))
   (('L ?f1 ('map* ?f2 ?s))
    (let ((f1 (simplify-map* ?f1))
          (f2 (simplify-map* ?f2))
          (s  (simplify-map* ?s)))
      (simplify-map* `(map* (L ,f1 ,f2) ,s))))
   (T e)))

(pprint (setq q4 (simplify-map* q3)))

'(MAP*
  (L (MAP* (L (L NAME-1 (KCJOIN SAME-EMP\#)) (PRED YEAR>=1980))
           paper)
     (PRED STATUS==PROF))
  EMP)
#||
This is now a doublely nested loop. In general there is one map* per table reference.

Simplifying lambda

Earlier, either we would let our language generate a lambda expression for us. For languages without this capability, we can define a data type (class) that would have the behavior we want to capture.


Simplifying an application


Transform map* to lambda

We will now transform the map* expressions into an efficient nested loop in terms of:

(dolist (item items) what) - iterate over a sequence of items.

(when condition what). - When condition is true do what.

(lambda args body) - Function construction.

(let bindings body) - Bind local variables.

The transformations will be written in terms of curried functions that we will manipulate symbolically.

The following functions will be used to construct and take appart the code.

||#

(define (make-lambda args body) (list 'lambda args body))
(define (lambda? x) (and (consp x) (eq (car x) 'lambda)))
(define (lambda-args x) (cadr x))
(define (lambda-body x) (caddr x))

(define (make-when condition body) (list 'when condition body))
(define (when? x) (and (consp x) (eq (car x) 'when)))
(define (when-condition x) (cadr x))
(define (when-body x) (caddr x))

(define (make-dolist binding body) (list 'dolist binding body))
(define (dolist? e) (and (consp e) (eq (car e) 'dolist)))
(define (dolist-binding e) (cadr e))
(define (dolist-body e) (caddr e))

(define (make-let bindings body) (list 'let bindings body))
(define (let? e) (and (consp e) (eq (car e) 'let)))
(define (let-bindings e) (cadr e))
(define (let-body e) (caddr e))

#||

The following transformation move when's from an inside expression to a surrounding one:

Function exchange:
(f ... (when t1 t2) ...) -> (when t1 (f ... t2 ...))

Deletion
(when t1 (f (when t1 t2))) -> (when t1 (f t2))

Distribution
(when (when t1 t2) t3) -> (when t1 (when t2 t3))

Loop:
(dolist (t1 t2) (when t3 t4)) -> (when t3 (dolist (t1 t2) t4))
when t1 does not occur in t3.
||#

(let ((count 0))
  (define (reset-arg-count) (setq count 0))
  (define (make-arg &optional (prefix "T"))
    (setq count (+ count 1))
    (intern (string-upcase (format nil "~a~d" prefix count)))))

(define (pred p)
  (let ((a (make-arg)))
    (make-lambda (list a)
                 (make-when 
                  (simplify-apply p (list a))
                  a))))

(define (KCjoin p)
  (let ((a (make-arg))
        (b (make-arg)))
    (make-lambda (list b a)
                 (make-when (simplify-apply p (list a b))
                            (simplify-apply 'cons (list a b))))))

(define (L f1 f2)
  (if (and (lambda? f1) (lambda? f2))
    ;; Both f1 and f2 are lambda expressions.
    (let ((a (lambda-args f1))
          (b (lambda-args f2)))
      (make-lambda (append b (cdr a))
                   (simplify-apply f1 (cons (lambda-body f2) (cdr a)))))
      ;; F2 is a lambda expression and f1 is a symbol naming a 
      ;; 1-argument function
      (make-lambda (lambda-args f2)
                   (simplify-apply f1 (list (lambda-body f2))))))
  
(define (map* f s)
  (let ((arg (make-arg))
        (args (cdr (lambda-args f))))
    (make-lambda args 
                 (make-dolist (list arg s)
                              (simplify-apply f (cons arg args))))))  

(define (simplify-apply f args)
  (let ((count (count-if #'when? args)))
    (if (= count 0)
      (if (lambda? f) 
        (simplify-apply-lambda f args)
        `(,f ,@args))
      (if (= count 1)
        (let ((cond (when-condition (find-if #'when? args))))
          `(when ,cond 
             ,(simplify-apply 
              f 
              (mapcar (lambda (x) (if (when? x) (when-body x) x))
                 args))))
        (error "Too many when's in ~a" args)))))

(define (simplify-apply-lambda f vals)
  (let ((args (lambda-args f))
        (body (lambda-body f)))
    (simplify-apply-lambda-1 body args vals '())))

(define (occurs arg form)
  (if (eq form arg) 1
      (if (consp form) (+ (occurs arg (car form))
                          (occurs arg (cdr form)))
          0)))

(define (simplify-apply-lambda-1 body args vals bindings)
  (if (null args)
    (if bindings (make-let bindings body)
        body)
    (if (or (<= (occurs (car args) body) 1)
            (symbolp (car vals)))
      (simplify-apply-lambda-1 (subst (car vals) (car args) body)
                               (cdr args)
                               (cdr vals)
                               bindings)
      (simplify-apply-lambda-1 body 
                               (cdr args)
                               (cdr vals)
                               (cons (list (car args) (car vals))
                                     bindings)))))

(define (lambdaify e)
  (case-match e
   (('pred ?p)    (pred (lambdaify ?p)))
   (('kcjoin ?p)  (kcjoin (lambdaify ?p)))
   (('L ?f1 ?f2)  (L (lambdaify ?f1) (lambdaify ?f2)))
   (('map* ?f ?s) (map* (lambdaify ?f) (lambdaify ?s)))
   (t e)))

(pprint (setq q5 (lambdaify q4)))

'(LAMBDA NIL
  (DOLIST (T12 EMP)
    (WHEN (STATUS==PROF T12)
      (DOLIST (T10 paper)
        (WHEN (YEAR>=1980 T10)
          (WHEN (SAME-EMP\# T12 T10)
            (NAME-1 (CONS T12 T10))))))))

#||

At this point, we have a nested loop, but still need to 
collect the result.

We add the following transformations:

make-result transformation:

(setq result (out (when t1 t2) result)
-> (when t1 (set result (out t2 result)))

(setq result (out (dolist (t1 t2) t3) result)
-> (dolist (t1 t2) (setq result (out t3 result)))

||#

(defun make-result (e)
  (assert (and (lambda? e) (null (lambda-args e))))
  `(let ((result '()))
     ,(make-result-1 (lambda-body e))
     result))

(defun make-result-1 (e)
  (if (dolist? e)
      `(dolist ,(dolist-binding e)
         ,(make-result-1 (dolist-body e)))
      (if (when? e)
        `(when ,(when-condition e) ,(make-result-1 (when-body e)))
        `(setq result (cons ,e result)))))

(pprint (setq q6 (make-result q5)))

'(LET ((RESULT 'NIL))
  (DOLIST (T6 EMP)
    (WHEN (STATUS==PROF T6)
      (DOLIST (T4 paper)
        (WHEN (YEAR>=1980 T4)
          (WHEN (SAME-EMP\# T6 T4)
            (SETQ RESULT (CONS (NAME-1 (CONS T6 T4)) RESULT)))))))
  RESULT)


(define name-1 '(lambda (a) (name (car a))))

(define same-emp# 'same-emp#)
(define year>=1980 'year>=1980)
(define status==prof 'status==prof)
(define paper 'paper)
(define emp 'emp)