# -*- Mode: Text -*- This file is a quick description of the compiler. ================================================================================ The phases of the compiler - and how each source file is named. 1) lisp_reader 2) transform 2.1) mbe 2.2) match 3) nodes 4) analyze 4.1) graph 5) typing 6) cps 7) backend 1) lisp_reader: Reads from a file or string a sequence of s-expressions. There are some slight differences here from a 'classical' Scheme/Lisp s-expression, but the basics are the same. (datatype field (:t symbol sexp) ) (datatype sexp (:list (list sexp)) (:symbol symbol) (:string string) (:char char) (:bool bool) (:int int) (:undef) (:vector (list sexp)) (:record (list field)) (:cons symbol symbol) ;; constructor ':' syntax (:attr sexp symbol) ;; attribute '.' syntax ) 2) transform: This performs source-level transformations on the input code, including the handling of some of the core forms like , , , etc... Bodies of forms like and are searched for definitions, datatypes, and macros here. During the pass over the source, the operator position of each application is checked for either hard-coded transforms, or for user macros, and then expanded as appropriate. 2.1) mbe: macro by example This expands macros when they are matched. The macro language used differs somewhat from Scheme's syntax-rules... I hope it is somewhat simpler. 2.2) match: match compiler This expands pattern-matching expressions into series of nvcase, pvcase, or conditional expressions. 3) nodes: AST/node conversion The source tree of s-expressions is now converted to a node tree. Each node consists of a record (holding mutable fields) with a 't' field holding the object of the node datatype. Each node is a record: { t : the type of the node subs : a list of all the sub-nodes of this node size : integer - the size of this node including all sub-nodes. id : integer - uniquely identifies each node type : the type of the node - initially an empty type. flags : integer - flags/properties of the node } The 't' field is the following datatype: (datatype node (:varref symbol) (:varset symbol) (:literal literal) (:cexp (list type) type string) ;; generic-tvars type template (:nvcase symbol (list symbol) (list int)) ;; datatype alts arities (:sequence) (:if) (:function symbol (list symbol)) ;; name formals (:call) (:let (list symbol)) (:fix (list symbol)) (:subst symbol symbol) (:primapp symbol sexp) ;; name params ) 4) analyze: optimizations, tree-shaking, etc... This repeatedly walks over the node tree, making decisions about inlining and various other optimizations. It also removes any dead code, and sets flags on both variables and nodes that are needed by later phases. 4.1) graph: builds a dependency graph, and also computes the strongly-connected-components of the call graph, which is needed by the typing phase. 5) typing: this does 'type reconstruction' (or 'type inference') on the resulting node tree, using a union-find algorithm along with Huet's unification algorithm. If the program passes this phase, then the final solved type is attacked to each node. [See also types.scm] Here's the 'type' datatype: (datatype type (:tvar int {parent=(maybe type) pending=bool moo=(maybe type)}) (:pred symbol (list type) {parent=(maybe type) pending=bool moo=(maybe type)}) ) Each type has a record inside to it to hold mutable data. The parent field is used by the union-find algorithm. The 'pending' and 'moo' fields are used in the management of recursive types. Types are either predicates or type variables. 'Base' types are implemented as simple empty predicates: for exampe the 'int' type is (pred 'int '() _). 6) cps The node tree is now translated into yet another language, a continuation-passing-style register transfer language. Each 'insn' contains meta-data about the instruction itself, possibly contains other sub-nodes, and a continuation. (datatype cont (:k int (list int) insn) ;; (:nil) ) A continutation is either empty (for an insn like 'return'), or contains the target register for the insn and a list of registers that are 'free' at that point in the code. ['free' here means something totally non-obvious if you're not a functional language person. see http://en.wikipedia.org/wiki/Free_variables_and_bound_variables for more info]. This phase keeps track of the 'tail' status of each node as it compiles, which makes it possible to implement proper tail recursion and thus high-performance looping. This phase is also responsible for collecting lists of literal constants, including things like symbols and pre-initialized data structures. Some insns (conditionals of various types) contain sub-trees of other insns. (datatype insn (:return int) ;; return register (:literal literal cont) ;; (:litcon int symbol cont) ;; (:cexp type type string (list int) cont) ;;