Lisp compiler in F#: Expression trees and .NET methods
June 01, 2009 at 10:36 PM | categories: Compiler | View CommentsThis is part 3 of a series of posts on my Lisp compiler written in F#. Previous entries: Introduction, Parsing with fslex and fsyacc | Browse the full source of the compiler on GitHub
Thanks to fslex, fsyacc and the LispVal
type, we've ended up with an expression tree that represents our program. To summarise, we've turned this:
(define (fact n) (if (= n 0) 1 (* n (fact (- n 1))))) (Console.WriteLine "6! = {0}" (fact 6)) (Console.WriteLine "What is your name?") (Console.WriteLine "Hello, {0}" (Console.ReadLine))
...into an F# data structure that looks like this:
[List [Atom "define"; List [Atom "fact"; Atom "n"]; List [Atom "if"; List [Atom "="; Atom "n"; Number 0]; Number 1; List [Atom "*"; Atom "n"; List [Atom "fact"; List [Atom "-"; Atom "n"; Number 1]]]]]; List [Atom "Console.WriteLine"; String "6! = {0}"; List [Atom "fact"; Number 6]]; List [Atom "Console.WriteLine"; String "What is your name?"]; List [Atom "Console.WriteLine"; String "Hello, {0}"; List [Atom "Console.ReadLine"]]]
We'd like to turn this data structure into actual IL, which we can execute. I'll assume some restrictions:
- We're going to compile, not interpret, and we're going to target .NET IL, not x86 machine code, LLVM, or anything else at this point
- Basic arithmetic (+, -, *, /) on integers
(- n 1)
- Equality comparisons
(= n 0)
if
statements
(if (= n 0) a b)
- Call static .NET methods (no
new
operator and no instance methods)
(Console.WriteLine "What is your name?")
- Define and call our own functions
(define (fact n) (if (= n 0) 1 (* n (fact (- n 1)))))
What we'll do is:
- Preprocess built-in forms such as arithmetic,
define
,if
andlambda
into specific nodes in the expression tree - Construct an instance of
System.Reflection.Emit.ILGenerator
. We could write an EXE or DLL file, but for experimentation, it's handy to target aDynamicMethod
- Emit IL opcodes that implement the expression tree
First, a couple of pattern matching functions to recognise the built-in forms. Strictly speaking, we could write a code generator which recognises the built-in forms directly, but turning them into first-class expression tree nodes early on will hopefully make it easier to apply compiler optimisations, which I hope to add at some point.
// Turn a LispVal into a function or variable name let extractAtom = function | Atom a -> a | v -> raise <| Compiler(sprintf "expected atom, got %A" v) // Note: insertPrimitives accepts a LispVal and returns a LispVal. // The function keyword combines function declaration with pattern matching. let rec insertPrimitives = function // Convert arithmetic operators into ListPrimitive | List (Atom "+" :: args) -> ListPrimitive (Add, args |> List.map insertPrimitives) | List (Atom "-" :: args) -> ListPrimitive (Subtract, args |> List.map insertPrimitives) | List (Atom "*" :: args) -> ListPrimitive (Multiply, args |> List.map insertPrimitives) | List (Atom "/" :: args) -> ListPrimitive (Divide, args |> List.map insertPrimitives) | List (Atom "=" :: args) -> ListPrimitive (Equal, args |> List.map insertPrimitives) | List (Atom "define" :: args) -> match args with | [ Atom name; v ] -> // Convert (define variableName value) into VariableDef VariableDef (name, insertPrimitives v) | [ List (Atom name :: names); body ] -> // Convert (define functionName (x y z) value) into a VariableDef wrapping a LambdaDef // This represents a named static .NET method VariableDef (name, LambdaDef (names |> List.map extractAtom, insertPrimitives body)) | _ -> // Note: "raise <| Compiler message" is equivalent to C# "throw new CompilerException(message)" raise <| Compiler "expected define name value" | List (Atom "if" :: args) -> match args with | [ testValue; thenValue; elseValue ] -> // Convert (if test then else) into IfPrimitive) IfPrimitive (insertPrimitives testValue, insertPrimitives thenValue, insertPrimitives elseValue) | _ -> raise <| Compiler "expected three items for if" | List (Atom "lambda" :: args) -> match args with | [ List names; body ] -> // Convert (lambda (x y z) value) into LambdaDef, without a VariableDef LambdaDef (names |> List.map extractAtom, insertPrimitives body) | _ -> raise <| Compiler "expected lambda names body" | List l -> // Apply insertPrimitives recursively on any function invokations l |> List.map insertPrimitives |> List | v -> v
The insertPrimitives
function turns our parsed expression tree into this:
[VariableDef ("fact", LambdaDef (["n"], IfPrimitive (ListPrimitive (Equal,[Atom "n"; Number 0]),Number 1, ListPrimitive (Multiply, [Atom "n"; List [Atom "fact"; ListPrimitive (Subtract,[Atom "n"; Number 1])]])))); List [Atom "Console.WriteLine"; String "6! = {0}"; List [Atom "fact"; Number 6]]; List [Atom "Console.WriteLine"; String "What is your name?"]; List [Atom "Console.WriteLine"; String "Hello, {0}"; List [Atom "Console.ReadLine"]]]
We're going to write an F# function that emits IL for one line in our program, and looks like this:
let rec compile (generator : ILGenerator) (defineMethod : string -> Type -> Type list -> #MethodInfo * ILGenerator) (env : Map<string, LispVal>) (value : LispVal) : (Map<string, LispVal>)
What this function signature tells us is:
- We have a recursive function called
compile
(by default, F# functions aren't allowed to call themselves, hence therec
keyword) - It takes the following parameters:
- An
ILGenerator
, i.e. the target of the IL we're going to generate - A function that accepts a
string
, aType
, alist
ofType
, and returns a tuple containing aMethodInfo
(or a type derived fromMethodInfo
, hence the #) and anotherILGenerator
. This will be the callback thatcompile
will call to create a new static method forlambda
: thestring
is a function name, theType
is a return type, and theType list
is a list of parameter types. - A
Map
ofstring
toLispVal
, i.e. the variables and functions defined by prior statements in the program - A
LispVal
representing the statement to generate code for
- An
- It returns
Map<string, LispVal>
, i.e. a copy ofenv
, possibly with some new variables or functions added
I'll cover the details of the compile
function itself in the next post. In this one I'd like to explain a couple of helper functions:
typeOf
, which returns the .NETType
denoted by aLispVal
lambdaIdent
, which retrieves aLambdaDef
We're using LambdaDef
nodes not only to define our own functions (like fact
in our example above, which calculates factorials), but also any .NET methods we call. typeOf
and lambdaIdent
call each other, so we have to define them together with F#'s and
keyword in between them:
typeOf
needs to calllambdaIdent
in order to determine the type returned by a function invocationlambdaIdent
needs to calltypeOf
when it looks at the types of function arguments when deciding which overload of a .NET method to call
let rec typeOf (env : Map<string, LispVal>) = function | ArgRef _ -> typeof<int> | Atom a -> a |> ident env |> typeOf env | Bool _ -> typeof<bool> | IfPrimitive (_, thenValue, elseValue) -> match typeOf env thenValue with | t when t = typeOf env elseValue -> t | _ -> raise <| Compiler("expected 'then' and 'else' branches to have same type") | LambdaDef (_, body) -> typeOf env body | LambdaRef (methodBuilder, _, _) -> methodBuilder.ReturnType | List (Atom a :: args) -> a |> lambdaIdent args env |> typeOf env | List (fn :: _) -> raise <| Compiler(sprintf "can't invoke %A" fn) | List [ ] -> raise <| Compiler("can't compile empty list") | ListPrimitive _ -> typeof<int> | Number _ -> typeof<int> | String _ -> typeof<string> | VariableDef _ -> typeof<Void> | VariableRef local -> local.LocalType
lambdaIdent
is moderately complicated: it needs to take the name of a function and a list of arguments and determine the correct .NET overload to call. (Even though I'm trying to keep this compiler simple, we need overload resolution in order to call Console.WriteLine
-- we can't write hello world without it.)
First, have we ourselves defined a function with the right name?
and lambdaIdent args env (a : string) = let envMatches = maybe { let! v = Map.tryFind a env let! r = match v with | LambdaRef _ -> Some v | _ -> None return r } |> Option.to_list
Note: the maybe
keyword isn't built into F#; we're using the F# equivalent of Haskell's Maybe monad, which a few other people have written about. Its purpose is to execute statements until one of them returns None
; the result of the maybe
block is determined by the return r
at the bottom.
At this point, envMatches
is a list of one or no LambdaRef
nodes, taken from our environment. Next: attempting to parse the method name as Namespace.Class.Method
. Again, note the use of maybe
to simplify the code that deals with Option
variables:
let clrTypeAndMethodName = maybe { let! (typeName, methodName) = match a.LastIndexOf('.') with | -1 -> None | n -> Some (a.Substring(0, n), a.Substring(n + 1)) let! clrType = referencedAssemblies |> List.map (fun assembly -> usingNamespaces |> List.map (fun usingNamespace -> (assembly, usingNamespace))) |> List.concat |> List.tryPick (fun (assembly, usingNamespace) -> option_of_nullable <| assembly.GetType(usingNamespace + "." + typeName)) return (clrType, methodName) }
referencedAssemblies
and usingNamespaces
are hard-coded equivalents to C#'s assembly references and using
statements. Next: a list of all the .NET methods with the right name, albeit maybe without the right parameter list:
let clrMatches = match clrTypeAndMethodName with | Some (clrType, methodName) -> clrType.GetMethods(BindingFlags.Public ||| BindingFlags.Static) |> List.of_array |> List.filter (fun m -> m.Name = methodName) |> List.map makeLambdaRef | None -> [ ]
A function that determines whether a function's parameter list of compatible with a set of arguments. The isParamArray
parameter indicates whether the .NET method has a variable parameter list (such as C#: void WriteLine(string format, params object[] args)
).
let argsMatchParameters = function | LambdaRef (_, isParamArray, parameterTypes) -> let rec argsMatchParameters' argTypes (parameterTypes : #Type list) = match argTypes, parameterTypes with | [ ], [ ] -> // No args and no parameters -> always OK true | [ ], [ _ ] -> // No args and one parameter -> OK only for params array methods isParamArray | [ ], _ -> // No args and two or more parameters -> never OK false | argType :: otherArgTypes, [ parameterType ] when isParamArray -> // One or more args and one parameter, in a params array method -> // OK if the types of the first arg and the params array are compatible, // and the rest of the args match the params array parameterType.GetElementType().IsAssignableFrom(argType) && argsMatchParameters' otherArgTypes parameterTypes | argType :: otherArgTypes, parameterType :: otherParameterTypes -> // One or more args and one or more parameters -> // OK if the types of the first arg and parameter are compatible, // and the rest of the args match the rest of the parameters parameterType.IsAssignableFrom(argType) && argsMatchParameters' otherArgTypes otherParameterTypes | _ :: _, [ ] -> // One or more args and no parameters -> never OK false argsMatchParameters' (List.map (typeOf env) args) parameterTypes | _ -> false
Finally, a combined list of all candidates (both from the environment and from .NET), the method overloads whose parameters are compatible with our arguments, and the chosen overload itself. When given more than one compatible overload we pick the first one we've given. (The ECMA C# spec defines detailed rules for picking the most appropriate method overload, but we've ignoring those in our language.)
let candidates = List.append envMatches clrMatches match candidates with | [ ] -> raise <| Compiler(sprintf "no method called %s" a) | _ -> () let allMatches = List.filter argsMatchParameters candidates match allMatches with | [ ] -> raise <| Compiler(sprintf "no overload of %s is compatible with %A" a args) | firstMatch :: _ -> firstMatch
We're now able to take a method name (as a string
) and a list of arguments (as LispVal
nodes), and decide what to call, whether it's one of our own functions or a method in a .NET library. We've done a large chunk of the work ahead of the next post, in which we'll finally get round to generating some useful IL.