Lisp compiler in F#: Expression trees and .NET methods
June 01, 2009 at 10:36 PM | categories: Compiler | View CommentsThis is part 3 of a series of posts on my Lisp compiler written in F#. Previous entries: Introduction, Parsing with fslex and fsyacc | Browse the full source of the compiler on GitHub
Thanks to fslex, fsyacc and the LispVal
type, we've ended up with an expression tree that represents our program. To summarise, we've turned this:
(define (fact n) (if (= n 0) 1 (* n (fact (- n 1))))) (Console.WriteLine "6! = {0}" (fact 6)) (Console.WriteLine "What is your name?") (Console.WriteLine "Hello, {0}" (Console.ReadLine))
...into an F# data structure that looks like this:
[List [Atom "define"; List [Atom "fact"; Atom "n"]; List [Atom "if"; List [Atom "="; Atom "n"; Number 0]; Number 1; List [Atom "*"; Atom "n"; List [Atom "fact"; List [Atom "-"; Atom "n"; Number 1]]]]]; List [Atom "Console.WriteLine"; String "6! = {0}"; List [Atom "fact"; Number 6]]; List [Atom "Console.WriteLine"; String "What is your name?"]; List [Atom "Console.WriteLine"; String "Hello, {0}"; List [Atom "Console.ReadLine"]]]
We'd like to turn this data structure into actual IL, which we can execute. I'll assume some restrictions:
- We're going to compile, not interpret, and we're going to target .NET IL, not x86 machine code, LLVM, or anything else at this point
- Basic arithmetic (+, -, *, /) on integers
(- n 1)
- Equality comparisons
(= n 0)
if
statements
(if (= n 0) a b)
- Call static .NET methods (no
new
operator and no instance methods)
(Console.WriteLine "What is your name?")
- Define and call our own functions
(define (fact n) (if (= n 0) 1 (* n (fact (- n 1)))))
What we'll do is:
- Preprocess built-in forms such as arithmetic,
define
,if
andlambda
into specific nodes in the expression tree - Construct an instance of
System.Reflection.Emit.ILGenerator
. We could write an EXE or DLL file, but for experimentation, it's handy to target aDynamicMethod
- Emit IL opcodes that implement the expression tree
First, a couple of pattern matching functions to recognise the built-in forms. Strictly speaking, we could write a code generator which recognises the built-in forms directly, but turning them into first-class expression tree nodes early on will hopefully make it easier to apply compiler optimisations, which I hope to add at some point.
// Turn a LispVal into a function or variable name let extractAtom = function | Atom a -> a | v -> raise <| Compiler(sprintf "expected atom, got %A" v) // Note: insertPrimitives accepts a LispVal and returns a LispVal. // The function keyword combines function declaration with pattern matching. let rec insertPrimitives = function // Convert arithmetic operators into ListPrimitive | List (Atom "+" :: args) -> ListPrimitive (Add, args |> List.map insertPrimitives) | List (Atom "-" :: args) -> ListPrimitive (Subtract, args |> List.map insertPrimitives) | List (Atom "*" :: args) -> ListPrimitive (Multiply, args |> List.map insertPrimitives) | List (Atom "/" :: args) -> ListPrimitive (Divide, args |> List.map insertPrimitives) | List (Atom "=" :: args) -> ListPrimitive (Equal, args |> List.map insertPrimitives) | List (Atom "define" :: args) -> match args with | [ Atom name; v ] -> // Convert (define variableName value) into VariableDef VariableDef (name, insertPrimitives v) | [ List (Atom name :: names); body ] -> // Convert (define functionName (x y z) value) into a VariableDef wrapping a LambdaDef // This represents a named static .NET method VariableDef (name, LambdaDef (names |> List.map extractAtom, insertPrimitives body)) | _ -> // Note: "raise <| Compiler message" is equivalent to C# "throw new CompilerException(message)" raise <| Compiler "expected define name value" | List (Atom "if" :: args) -> match args with | [ testValue; thenValue; elseValue ] -> // Convert (if test then else) into IfPrimitive) IfPrimitive (insertPrimitives testValue, insertPrimitives thenValue, insertPrimitives elseValue) | _ -> raise <| Compiler "expected three items for if" | List (Atom "lambda" :: args) -> match args with | [ List names; body ] -> // Convert (lambda (x y z) value) into LambdaDef, without a VariableDef LambdaDef (names |> List.map extractAtom, insertPrimitives body) | _ -> raise <| Compiler "expected lambda names body" | List l -> // Apply insertPrimitives recursively on any function invokations l |> List.map insertPrimitives |> List | v -> v
The insertPrimitives
function turns our parsed expression tree into this:
[VariableDef ("fact", LambdaDef (["n"], IfPrimitive (ListPrimitive (Equal,[Atom "n"; Number 0]),Number 1, ListPrimitive (Multiply, [Atom "n"; List [Atom "fact"; ListPrimitive (Subtract,[Atom "n"; Number 1])]])))); List [Atom "Console.WriteLine"; String "6! = {0}"; List [Atom "fact"; Number 6]]; List [Atom "Console.WriteLine"; String "What is your name?"]; List [Atom "Console.WriteLine"; String "Hello, {0}"; List [Atom "Console.ReadLine"]]]
We're going to write an F# function that emits IL for one line in our program, and looks like this:
let rec compile (generator : ILGenerator) (defineMethod : string -> Type -> Type list -> #MethodInfo * ILGenerator) (env : Map<string, LispVal>) (value : LispVal) : (Map<string, LispVal>)
What this function signature tells us is:
- We have a recursive function called
compile
(by default, F# functions aren't allowed to call themselves, hence therec
keyword) - It takes the following parameters:
- An
ILGenerator
, i.e. the target of the IL we're going to generate - A function that accepts a
string
, aType
, alist
ofType
, and returns a tuple containing aMethodInfo
(or a type derived fromMethodInfo
, hence the #) and anotherILGenerator
. This will be the callback thatcompile
will call to create a new static method forlambda
: thestring
is a function name, theType
is a return type, and theType list
is a list of parameter types. - A
Map
ofstring
toLispVal
, i.e. the variables and functions defined by prior statements in the program - A
LispVal
representing the statement to generate code for
- An
- It returns
Map<string, LispVal>
, i.e. a copy ofenv
, possibly with some new variables or functions added
I'll cover the details of the compile
function itself in the next post. In this one I'd like to explain a couple of helper functions:
typeOf
, which returns the .NETType
denoted by aLispVal
lambdaIdent
, which retrieves aLambdaDef
We're using LambdaDef
nodes not only to define our own functions (like fact
in our example above, which calculates factorials), but also any .NET methods we call. typeOf
and lambdaIdent
call each other, so we have to define them together with F#'s and
keyword in between them:
typeOf
needs to calllambdaIdent
in order to determine the type returned by a function invocationlambdaIdent
needs to calltypeOf
when it looks at the types of function arguments when deciding which overload of a .NET method to call
let rec typeOf (env : Map<string, LispVal>) = function | ArgRef _ -> typeof<int> | Atom a -> a |> ident env |> typeOf env | Bool _ -> typeof<bool> | IfPrimitive (_, thenValue, elseValue) -> match typeOf env thenValue with | t when t = typeOf env elseValue -> t | _ -> raise <| Compiler("expected 'then' and 'else' branches to have same type") | LambdaDef (_, body) -> typeOf env body | LambdaRef (methodBuilder, _, _) -> methodBuilder.ReturnType | List (Atom a :: args) -> a |> lambdaIdent args env |> typeOf env | List (fn :: _) -> raise <| Compiler(sprintf "can't invoke %A" fn) | List [ ] -> raise <| Compiler("can't compile empty list") | ListPrimitive _ -> typeof<int> | Number _ -> typeof<int> | String _ -> typeof<string> | VariableDef _ -> typeof<Void> | VariableRef local -> local.LocalType
lambdaIdent
is moderately complicated: it needs to take the name of a function and a list of arguments and determine the correct .NET overload to call. (Even though I'm trying to keep this compiler simple, we need overload resolution in order to call Console.WriteLine
-- we can't write hello world without it.)
First, have we ourselves defined a function with the right name?
and lambdaIdent args env (a : string) = let envMatches = maybe { let! v = Map.tryFind a env let! r = match v with | LambdaRef _ -> Some v | _ -> None return r } |> Option.to_list
Note: the maybe
keyword isn't built into F#; we're using the F# equivalent of Haskell's Maybe monad, which a few other people have written about. Its purpose is to execute statements until one of them returns None
; the result of the maybe
block is determined by the return r
at the bottom.
At this point, envMatches
is a list of one or no LambdaRef
nodes, taken from our environment. Next: attempting to parse the method name as Namespace.Class.Method
. Again, note the use of maybe
to simplify the code that deals with Option
variables:
let clrTypeAndMethodName = maybe { let! (typeName, methodName) = match a.LastIndexOf('.') with | -1 -> None | n -> Some (a.Substring(0, n), a.Substring(n + 1)) let! clrType = referencedAssemblies |> List.map (fun assembly -> usingNamespaces |> List.map (fun usingNamespace -> (assembly, usingNamespace))) |> List.concat |> List.tryPick (fun (assembly, usingNamespace) -> option_of_nullable <| assembly.GetType(usingNamespace + "." + typeName)) return (clrType, methodName) }
referencedAssemblies
and usingNamespaces
are hard-coded equivalents to C#'s assembly references and using
statements. Next: a list of all the .NET methods with the right name, albeit maybe without the right parameter list:
let clrMatches = match clrTypeAndMethodName with | Some (clrType, methodName) -> clrType.GetMethods(BindingFlags.Public ||| BindingFlags.Static) |> List.of_array |> List.filter (fun m -> m.Name = methodName) |> List.map makeLambdaRef | None -> [ ]
A function that determines whether a function's parameter list of compatible with a set of arguments. The isParamArray
parameter indicates whether the .NET method has a variable parameter list (such as C#: void WriteLine(string format, params object[] args)
).
let argsMatchParameters = function | LambdaRef (_, isParamArray, parameterTypes) -> let rec argsMatchParameters' argTypes (parameterTypes : #Type list) = match argTypes, parameterTypes with | [ ], [ ] -> // No args and no parameters -> always OK true | [ ], [ _ ] -> // No args and one parameter -> OK only for params array methods isParamArray | [ ], _ -> // No args and two or more parameters -> never OK false | argType :: otherArgTypes, [ parameterType ] when isParamArray -> // One or more args and one parameter, in a params array method -> // OK if the types of the first arg and the params array are compatible, // and the rest of the args match the params array parameterType.GetElementType().IsAssignableFrom(argType) && argsMatchParameters' otherArgTypes parameterTypes | argType :: otherArgTypes, parameterType :: otherParameterTypes -> // One or more args and one or more parameters -> // OK if the types of the first arg and parameter are compatible, // and the rest of the args match the rest of the parameters parameterType.IsAssignableFrom(argType) && argsMatchParameters' otherArgTypes otherParameterTypes | _ :: _, [ ] -> // One or more args and no parameters -> never OK false argsMatchParameters' (List.map (typeOf env) args) parameterTypes | _ -> false
Finally, a combined list of all candidates (both from the environment and from .NET), the method overloads whose parameters are compatible with our arguments, and the chosen overload itself. When given more than one compatible overload we pick the first one we've given. (The ECMA C# spec defines detailed rules for picking the most appropriate method overload, but we've ignoring those in our language.)
let candidates = List.append envMatches clrMatches match candidates with | [ ] -> raise <| Compiler(sprintf "no method called %s" a) | _ -> () let allMatches = List.filter argsMatchParameters candidates match allMatches with | [ ] -> raise <| Compiler(sprintf "no overload of %s is compatible with %A" a args) | firstMatch :: _ -> firstMatch
We're now able to take a method name (as a string
) and a list of arguments (as LispVal
nodes), and decide what to call, whether it's one of our own functions or a method in a .NET library. We've done a large chunk of the work ahead of the next post, in which we'll finally get round to generating some useful IL.
Lisp compiler in F#: Parsing with fslex and fsyacc
May 31, 2009 at 05:12 PM | categories: Compiler | View CommentsThis is part 2 of a series of posts on my Lisp compiler written in F#. Previous entry: Introduction | Browse the full source of the compiler on GitHub
In this post I'll talk about the fslex and fsyacc tools, which, as their names suggest, are similar to the widely used lex and yacc tools, which generate lexical scanners and parsers respectively.
These tools generate two F# modules that work together:
- fslex generates code that splits a string into tokens, according to a set of rules similar to regular expressions
- fsyacc generates code that recognises sequences of tokens, and does something with them, such as building an expression tree
fslex and fsyacc are both standalone executables that are installed in the bin subdirectory of an F# distribution. Although can invoke them by hand, their command line parameters and error handling are fairly basic. Luckily F# also ships with a pair of MSBuild tasks that you can put in your .fsproj file.
- Open your .fsproj file in a text editor (or, in Visual Studio, unload the project, then click 'Edit ProjectName.fsproj' on the unloaded project's right click menu)
-
Towards the bottom you should see a line as follows:
<Import Project="$(MSBuildExtensionsPath)\FSharp\1.0\Microsoft.FSharp.Targets" />
Under this line, paste the following:
<Import Project="$(MSBuildExtensionsPath)\FSharp\1.0\FSharp.PowerPack.Targets" />
- If you have the F# May 2009 CTP (version 1.6.9.16) or newer, remove the Microsoft.FSharp.Targets line. If you have the September 2008 CTP (version 1.6.9.2) or older, leave the Microsoft.FSharp.Targets line alone. (The May version of FSharp.PowerPack.Targets has its own import of Microsoft.FSharp.Targets, so leaving the import in the .fsproj file causes an MSBuild error.)
-
Slightly higher up in the .fsproj file, you'll see an
<ItemGroup>
element that contains your source files. Add the following lines; put them at the top of the list so that fslex and fsyacc get run before any of the compilation steps:<FsYacc Include="YourParser.fsy" />
<FsLex Include="YourScanner.fsl" />
- Save and close the .fsproj file in the text editor, then reload it in Visual Studio to ensure it still loads OK. (You did keep the original version in source control, right?) The .fsy and .fsl files will be missing at this point, but I'm going to discuss them in a moment.
- Extra step for the May CTP: The DLL that implements the
<FsLex>
and<FsYacc>
tasks is installed to the wrong directory. You'll need to follow this set of instructions and copy the FSharp.PowerPack.Build.Tasks.dll file to the right place.
Once you've successfully run fslex and fsyacc and generated F# source for the first time, you'll need to include these F# source files in your project. Because the file generated by fslex relies on fsyacc's token definitions, you'll need to include the generated fsyacc file first -- remember that build order is significant in F#, unlike in C#. Feel free to check this generated source into source control, but remember to check it out before making changes to the .fsl and .fsy sources; these tools thrown an access denied exception if the generated source file is read only.
The .fsy and .fsl files should look familiar if you've used lex and yacc before. Here are the rules I'm using to parse Lisp s-expressions:
FSYacc.fsy -- generates FSYacc.fsi and FSYacc.fs
%{ open Tim.Lisp.Core %} %start parse %token <string> Identifier %token <string> Text %token <int> Digits %token Apostrophe LeftParen RightParen Eof %type <Tim.Lisp.Core.LispVal list> parse %% Expr: Identifier { Atom $1 } | Text { String $1 } | Digits { Number $1 } | LeftParen ExprList RightParen { $2 |> List.rev |> List } | Apostrophe Expr { List [ Atom "quote"; $2 ] } ExprList: Expr { [ $1 ] } | ExprList Expr { $2 :: $1 } parse: ExprList Eof { List.rev $1 }
Note that the fsyacc source defines the list of tokens that it expects the fslex source to emit. Because this is F#, the list of %code
statements above results in the generation of one of our old friends, the discriminated union:
FSYacc.fsi -- generated from FSYacc.fsy
// Signature file for parser generated by fsyacc #light type token = | Apostrophe | LeftParen | RightParen | Eof | Digits of (int) | Text of (string) | Identifier of (string) // snip rest of generated code
The rest of the file consists of BNF-like rules; here I'm recognising the key parts of s-expression syntax and emitting an expression tree in the form of my own LispVal
data structure, which is another discriminated union.
The .fsl file takes text, applies regular expressions to it, and returns a stream of tokens:
FSLex.fsl -- generates FSLex.fs
{ open System open System.Text open Microsoft.FSharp.Text.Lexing open FSYacc } let digit = ['0'-'9'] let alpha = ['a'-'z' 'A'-'Z'] let whitespace = [' ' '\t'] let newline = ('\n' | '\r' '\n') let identifier = [^'"' '0'-'9' '(' ')' ' ' '\t' '\n' '\r'] rule tokenize = parse | whitespace { tokenize lexbuf } | newline { lexbuf.EndPos <- lexbuf.EndPos.NextLine; tokenize lexbuf } | ['-']?digit+ { Digits <| Int32.Parse(Encoding.UTF8.GetString(lexbuf.Lexeme)) } | '(' { LeftParen } | ')' { RightParen } | '\'' { Apostrophe } | '"' [^'"']* '"' { let s = Encoding.UTF8.GetString(lexbuf.Lexeme) in Text <| s.Substring(1, s.Length - 2) } | eof { Eof } | identifier+ { Identifier <| Encoding.UTF8.GetString(lexbuf.Lexeme) }
fslex regular expressions aren't entirely standard:
- White space is ignored
- Literal characters are enclosed in single quotes
- You're allowed to use
let
to define aliases (seedigit
,alpha
,whitespace
,newline
andidentifier
above)
Each rule
block defines a function in the generated F# source that accepts an Microsoft.FSharp.Text.Lexing.LexBuffer<byte>
and returns the token type defined by the generated fsyacc source. Although LexBuffer
is a generic type, fslex prior to May only generates ASCII parsers, hence the calls to Encoding.UTF8.GetString
above. The May CTP adds a --unicode command line parameter, which appears to enable the use of LexBuffer<char>
, but I haven't found how to enable this option via the MSBuild task.
Given the generated source code, it's relatively straightforward to write a facade around it. Note that you'll need to add a reference to Microsoft.FSharp.PowerPack.dll in order to use the types in the Microsoft.FSharp.Text.Lexing
namespace.
#light open System.Text open Microsoft.FSharp.Text.Lexing module Parser = // Accepts a string and either returns a list of LispVal objects or throws an exception let parseString (s : string) = FSYacc.parse FSLex.tokenize <| LexBuffer<_>.FromBytes(Encoding.UTF8.GetBytes(s))
In the next post, I'll take the LispVal
objects returned by the parseString
function and use them to generate actual IL.
Lisp compiler in F#: Introduction
May 30, 2009 at 10:34 PM | categories: Compiler | View CommentsBrowse the full source of the compiler on GitHub
I started learning functional programming in F# and Haskell around 6 months ago, and one of the programs I've been writing in order to learn F# is a small Lisp compiler, with syntax similar to Scheme:
(define (fact n) (if (= n 0) 1 (* n (fact (- n 1))))) (Console.WriteLine "6! = {0}" (fact 6)) (Console.WriteLine "What is your name?") (Console.WriteLine "Hello, {0}" (Console.ReadLine))
I'm planning on posting a series of articles explaining how the compiler demonstrates some useful F# features. By way of an introduction, I'd like to talk about two of my favourite features from functional languages: discriminated unions and pattern matching.
At their simplest, discriminated unions are equivalent to .NET enums, since they allow a strongly-types variable to hold one of a fixed set of values:
(* * Rough C# equivalent: * public enum ListOp { Add, Subtract, Multiply, Divide, Equal } *) type ListOp = Add | Subtract | Multiply | Divide | Equal
But each of these possible values can be tagged with a data structure, which means that discriminated unions are useful where an object-orientated language would need a hierarchy of classes:
(* * Rough C# equivalent: * public abstract class LispVal { } * * public sealed class ArgRef : LispVal * { * public ArgRef(int index) * { * Index = index; * } * * public int Index { get; private set; } * } * * public sealed class Atom : LispVal * { * public Atom(string text) * { * Text = text; * } * * public string Text { get; private set; } * } * * etc., for the other cases *) type LispVal = | ArgRef of int | Atom of string | Bool of bool | IfPrimitive of LispVal * LispVal * LispVal | LambdaDef of string list * LispVal | LambdaRef of MethodInfo * bool * Type list | List of LispVal list | ListPrimitive of ListOp * LispVal list | Number of int | QuotePrimitive of LispVal | String of string | VariableDef of string * LispVal | VariableRef of LocalBuilder
When faced with the task of writing code that understands the C# class hierarchy above, I'd implement the Visitor pattern: I'd define an interface called ILispValVisitor
, and every time I needed to process a LispVal
somehow, I'd declare a nested class that implemented the right ILispValVisitor
methods. Fairly straightforward, right? -- a boilerplate class implementation in every piece of code that needs to look inside a LispVal
.
Functional languages have a much more elegant alternative, in the form of pattern matching operators. There's no real equivalent to pattern matching in C#, although the concept could be similar to if
, switch
or the visitor pattern depending on the situation:
(* * Rough C# equivalent: an implementation of the visitor pattern *) let rec typeOf (env : Map) = function | ArgRef _ -> typeof | Atom a -> a |> ident env |> typeOf env | Bool _ -> typeof | IfPrimitive (_, thenValue, elseValue) -> match typeOf env thenValue with | t when t = typeOf env elseValue -> t | _ -> raise <| Compiler("expected 'then' and 'else' branches to have same type") | LambdaDef (_, body) -> typeOf env body | LambdaRef (methodBuilder, _, _) -> methodBuilder.ReturnType | List (Atom a :: args) -> a |> lambdaIdent args env |> typeOf env | List (fn :: _) -> raise <| Compiler(sprintf "can't invoke %A" fn) | List [ ] -> raise <| Compiler("can't compile empty list") | ListPrimitive _ -> typeof | Number _ -> typeof | String _ -> typeof | QuotePrimitive _ -> typeof | VariableDef _ -> typeof | VariableRef local -> local.LocalType
The Lisp compiler I wrote relies on a discriminated union, LispVal
for storing expression trees, and pattern matching for most of its processing. I'll post more in-depth articles covering the source code in detail, including:
- Parsing with fslex and fsyacc
- Code generation using
System.Reflection.Emit
- Calling the compiler from C#
Eric Lippert on infoof()
May 22, 2009 at 06:42 PM | categories: Uncategorized | View CommentsI previously mentioned some hypothetical C# fieldof
and methodof
operators, which would obtain FieldInfo
and MethodInfo
objects, in the same way that typeof
retrieves Type
at runtime.
Eric Lippert explains the pros and cons of a combined infoof
operator, which mainly centre around how hard it would be to pick the right method overload each time.
In the comments, Jonathan Pryor points out my favourite replacement for methodof
:
Fastest by far in my testing -- faster than string-based Reflection, in fact -- is Mike's "Poor's man infoof" using the Delegate.Method property. This effectively bypasses most of the Reflection infrastructure (no Type.GetMember(), etc.), and is the closest we can get to IL member lookup resolution.
Inline IL assembly
April 23, 2009 at 09:05 PM | categories: Uncategorized | View CommentsWriting about C# and IL made me wonder how hard it would be to give inline assembly capabilities to, say, C#. (F# already has inline IL; I'm not sure about other .NET languages.)
It turns out to be surprisingly easy to write a tool that injects IL into a .NET assembly, after the compiler has finished, using Mono Cecil. In fact, the example on the Cecil FAQ injects WriteLine
calls at the start of each method.
I put together a proof of concept that lets you write code like this:
public static class Program { private static int Calculate() { IL.Push(10, 1, 1); IL.Emit("add"); IL.Emit("mul"); IL.Emit("ret"); return 0; } public static void Main() { Console.WriteLine("(1 + 1) * 10 = {0}", Calculate()); } }
The methods on the IL
class are placeholders that get replaced by a separate post-processor:
IL.Push
: Removed by the post-processor, leaving the arguments behind on the virtual machine stackIL.Emit
: Replaced with the appropriate opcode by the post-processorIL.Pop
: Removed by the post-processor, allowing a value on the VM stack to be consumed by regular C# code
The IL
class needs to be defined as part of your application. Although the code inside this class isn't important -- it'll never be executed -- the way the methods are declared is important:
IL.Push
needs separate generic overloads taking 1, 2, 3, etc. parameters. If we defined one method, takingparams object[]
, then the compiler would construct an array. If we defined overloads takingobject
instead of generic types, then the compiler would box value types.IL.Emit
is straightforward, taking a singlestring
parameter.IL.Pop
returns a generic value. It's up to you to specify this generic type according to what's on the stack.
One limitation with this demo is that the IL.Emit
method just takes a string, so no MethodInfo
, Label
etc., and no calls, branching, etc. I haven't thought of a decent way of specifying this optional parameter in a call to IL.Emit
: maybe another string, suitably encoded?
Here's the code for the post processor, conveniently provided as an NUnit test fixture. Its purpose is to load an assembly using Mono Cecil, loop through each method, recognise the IL emitted by the C# compiler for each of the three IL
methods, replace those calls with chunks of real IL, and save a new assembly. A word of advice: peverify
is essential for testing: it's easy to generate unverifiable IL this way.
using System; using System.Collections.Generic; using System.Linq; using System.Reflection; using Mono.Cecil; using Mono.Cecil.Cil; using NUnit.Framework; [TestFixture] public class InlineIL { private static bool ReplaceInstructions(MethodDefinition method) { CilWorker worker = method.Body.CilWorker; foreach (Instruction instruction in method.Body.Instructions) { if (instruction.OpCode == OpCodes.Ldstr) { Instruction next = instruction.Next; if (next == null) continue; if (next.OpCode != OpCodes.Call) continue; MethodReference operand = (MethodReference) next.Operand; if (operand.DeclaringType.Name != "IL") continue; switch (operand.Name) { case "Emit": string asm = ((string) instruction.Operand).Replace('.', '_'); FieldInfo field = typeof(OpCodes).GetField( asm, BindingFlags.Public | BindingFlags.Static | BindingFlags.IgnoreCase); if (field == null) throw new InvalidOperationException("Unrecognised opcode " + asm + "."); OpCode opCode = (OpCode) field.GetValue(null); Instruction replacement = worker.Create(opCode); worker.Replace(instruction, replacement); worker.Remove(next); return true; } } else if (instruction.OpCode == OpCodes.Call) { MethodReference operand = (MethodReference) instruction.Operand; if (operand.DeclaringType.Name != "IL") continue; switch (operand.Name) { case "Push": case "Pop": worker.Remove(instruction); return true; } } } return false; } [Test] public void ExpandInlineAsm() { AssemblyDefinition assembly = AssemblyFactory.GetAssembly("Temp.exe"); IEnumerable<MethodDefinition> methods = from module in assembly.Modules.Cast<ModuleDefinition>() from type in module.Types.Cast<TypeDefinition>() from method in type.Methods.Cast<MethodDefinition>() select method; foreach (var method in methods) { while (ReplaceInstructions(method)) { } } AssemblyFactory.SaveAssembly(assembly, "Temp.new.exe"); } }
« Previous Page -- Next Page »