Lisp compiler in F#: What's next?
June 09, 2009 at 11:12 PM | categories: Compiler | View CommentsThis is part 5 of a series of posts on my Lisp compiler written in F#. Previous entries: Introduction, Parsing with fslex and fsyacc, Expression trees and .NET methods, IL generation | Browse the full source of the compiler on GitHub
This post marks the end of the first series of Lisp compiler posts, since we're at the point where the code does something useful while still being compact enough to explain in a few blog posts. In the future it should make an interesting test bed for learning about compiler techniques, which I hope to cover here.
Here's some of the ideas I'd like to try out:
- Implement all the Lisp functionality from Jonathan Tang's Write Yourself a Scheme in 48 Hours tutorial. Jonathan explains how to implement a Scheme interpreter in Haskell, which is a similar goal to my Scheme-like compiler in F#. (It was actually his tutorial that first gave me the idea of doing it in F#.)
- .NET integration:
- Ability to create new .NET objects, call instance methods, and access properties and events. Currently we're restricted to static methods.
- Ability to define your own .NET classes. One thing I'd like to be able to do is implement the NUnit tests for this project directly in Lisp, which means the compiler needs to be able to generate instance methods with custom attributes applied.
- A full System.CodeDom.Compiler implementation: Lisp on ASP.NET anyone?
- Optimisations:
Tail call optimisation: replace the call, ret sequence with tail.call, ret. This is less an optmisation and more a necessity, since recursion is currently the only way to implement looping, and we need tail calls to avoid overflowing the stack. The tail opcode prefix is recognised directly by the CLR: another approach would be for the compiler to implement a recursive function as a
while
loop. Tail calling on .NET is a moderately interesting topic in its own right: see the links from this Stack Overflow question to get an idea of the issues involved.Edit: as Paul points out, the trick is to optimise all tail calls, not just tail recursion. It's easy to come up with a pair of functions that call each other: if we just looked for recursive calls back to the same function, we'd blow up the stack in this situation. Luckily for us, the IL tail prefix is valid on any call, as long as it comes just before a ret, so we don't need to be too clever.
- Arithmetic optimisations: something as simple as simplifying constant expressions at compile time
- A command line compiler, with parameters similar to those of csc.exe or fsc.exe
- Don't generate IL directly to
ILGenerator
; assemble F# data structures representing the IL, so that we can apply IL optimisations using standard F# constructs such as pattern matching - Implement flow control within the compiler on top of a flow control graph data structure, along the lines of basic blocks. This will make optimisations of program flow easier to implement.
Let me know if you've found this series useful so far, or if you have any corrections or suggestions on what I've been writing.