Extending the pharo compiler

Posted on March 16, 2020 by yokodake

I learned Pharo at univeristy for a metaprogamming class. I believe it to be a very important milestone in my programming education. On paper, Pharo was everything I hated in a programming language: OO, dynamicly typed, interpreted, very small flexibility wrt tooling (can’t use your own version control, text editor/IDE, console, etc.). But this ended up being my favourite class, and I have fond memories of the nights I spent writing pharo.
The language made me realize that the quality of a programming language isn’t mainly defined by its features (although I’m still a big proponent of statically typed languages); rather a principled and consistent approach to the language design is way more important in my opinion!

One of the cool things of Smalltalk is that everything is an object. Really everything, not Java-tier everything. Classes are objects, metaclasses are objects, the compiler itself is an object!
In pharo the frontier between compiletime and runtime is kinda blurry, and it allows us to do really cool things, such as recompiling, modifying, creating, removing methods or classes.
This also brings us quite a cool TDD experience: just write your tests, run them before you wrote the code. Exceptions should be raised for Objects, classes or messages that haven’t been implemented. You click on the create button, implement it, click continue and repeat.
And this isn’t some magical feature, Pharo exposes all the facilities for us to implement this ourselves if we wanted. Almost everything in the language is reifiable (including the stack itself)!

Of course, business applications don’t really need this level of reflexivity. And that’s probably why there aren’t a lot of articles or talks about those features. The size of a community also plays a role and it doesn’t seem to be as interrested in PLT as others either.

Nonetheless, I decided to make this post where I go over how to implement two features: conditional compilation (similar to CPP’s #if, #else) and loop unrolling. The first is as an AST transformation and the second by generating IR.

Conditional Compilation

For simplicity we’ll only support an #__ifTrue: message (underscore is the only symbol allowed in keyword message identifiers). Supporting #__ifFalse: and #__ifTrue:ifFalse: is trivial. We’ll also only support blocks as receiver.

A quick example of usage:

[ Debugging ] __ifTrue: [ GrowlMorph openWithLabel: 'Debug' contents: 'foo= ' , foo asString. ]

the opal compiler

The whole compilation process of the Opal compiler (pharo’s compiler) is reified, the four big passes are: 1. Parsing which produces an AST
see RBProgramNode in AST-Core package 2. Semantic analysis which adds information to the AST (such as scoping)
see OCASTSemanticAnalyzer in OpalCompiler-Core 3. IR generation to build an Intermediate Representation
see OCASTTranlator and IR-Nodes in OpalCompiler-Core 4. Bytecode generation
see IRTranslator

Every one of these transformation is implemented as a visitor (IRVisitorV2 and RB*NodeVisitor). Since compile time conditionals are supposed to determine what is to be compiled, we’ll work on the AST.
The OpalCompiler has a plugin system which allow us to transform the AST before IR generation and after semantic analysis. That’s how we’ll implement our feature.

A simple way to grab a compiler with sensible defaults for extension is with Smalltalk compiler. Although every class supports the compiler message (defined in Behavior).

oc := Smalltalk compiler
   noPattern: true; "set noPattern if you plan on compiling code snippets instead of methods"
   addPlugin: SomePlugin;
   yourself.

Use the compile: or evaluate: messages (each taking a string as argument) to compile and evaluate code.
compile: returns a CompiledMethod if noPattern was set it will be named #DoIt. The CompiledMethod holds the AST, the IR and the bytecode, useful for debugging!
Executing a CompiledMethod is done with the valueWithReceiver:arguments: message. This is because every method is a message to an object’s instance, therefore to pass the receiver as an argument. arguments: expects an Array.

This means it is possible to use class and instance variables (of the receiver) in the code we pass to compile:. Either grab the class’s compiler (ClassName compiler) or set it in the CompilationContext.

For example, with Association which has a key and value instance variable:

oc := Smalltalk compiler noPattern: true; yourself.
oc compilationContext class: Association.
(oc compile 'value + 2') valueWithReceiver: (#key -> 3) arguments: #(). "5"

API discovery is really easy in Pharo, and I really encourage to browse the different classes and packages mentionned. Don’t forget to use the spotter (Shift + Enter) which is an invaluable tool!

transforming the AST

We start by creating our plugin:

OCCompilerASTPlugin subclass: #MyCompilerASTPlugin
   instanceVariableNames: ''
   classVariableNames: ''
   package: 'Blog'

The OCCompilerASTPlugin is an abstract class which requires two methods to be overriden. transform on the instance side, and priority on the class side. priority should return a number between 0 and 100. We’ll return 100 so our transformation gets the highest priority.

The transformation isn’t very hard because the AST has a very easy API to use.

MyCompilerASTPlugin>> transform
   | nodes |
   "we grab all the nodes which are a `#__ifTrue` message sent to a block"
   nodes := ast copy sendNodes
      select: [ :n |
         n receiver isBlock
            and: (n selector = #__ifTrue)].
   nodes
      do: [ :n |
         | pred br |
         "we must first evaluate the predicate"
         pred := self eval: n receiver.
         "simple error handling in case the predicate doesn't evaluate to a boolean"
         pred class superclass = Boolean
            ifFalse: [ Error signal: 'predicate not a Boolean' ]
         "the first argument is the code we wanted to conditionnally compile
            we don't want to compile the block itself, but its body (a sequence)"
         br = pred
            ifTrue: [ (n argument at: 1) body ]
            ifFalse: [ RBSequenceNode statements: {} ] "empty sequence otherwise"
         n replaceWith: br ].

The transform method is pretty straightforward: we get a list of all the send nodes, filter on message selector (#__ifTrue:) and receiver (Block), evaluate the receiver (our predicate), and replace the entire node with either the branch or and empty sequence.

compile-time evaluation

I simply abstracted the evaluation to keep transform more readable.

MyCompilerASTPlugin>> eval: aRBBlockNode
   | body mast cm |
   body := aRBBlockNode copy body
      addReturn;
      yourself.

   mast := (RBMethodNode selector: #DoIt body: body) doSemanticAnalysis.
   (mast variableNodes allSatisfy: [:n | n isGlobal or: n isTemp ])
      ifFalse: [ Error signal: 'cannot resolve at compile time' ]

   ^ mast generateWithSource valueWithReceiver: nil arguments: #()

We could just pretty print the ast, and then like mentionned earlier evaluate: it. But It’d be a waste to output code just for it to be parsed again. So why not just transform our block’s body like evaluate would: wrap our block’s body into a DoIt method and return the last statement. addReturn does the latter, for the former we just create an RBMethodNode.
There’s one catch though: We absolutely have to do another semantic analysis pass over our block!

Consider the following code:

foo
   |x|
   "..."
   [ (x + 2) = 4 ] __ifTrue: [ 'foo' ]
   "..."

x in foo is a temporary variable, but that’s only true inside the method foo! If we take the block’s body on its own (x + 2) = 4, x now has a larger scope (maybe global or instance/class) but the SemanticAnalyzer. annoted it while parsing the entire method foo. Since we’re going to evaluate the block while compiling the method, we cannot depend on runtime values (such as temporaries with a scope larger than the block itself). That’s why we have to do a new semantic analysis pass.

improvements

Our conditional compilation is pretty barebones.

All in all, it’s probably better to not implement this sort of feature in a stage as late as the AST transformation pass. In languages that support this sort of feature, it happens before lexing or
parsing. Although we want to do conditional compilation, we have to be as syntactically correct as normal keyword message pass. Code like a [ cond ] __ifTrue [ + ] __ifFalse [ - ] b aren’t possible because the parser would rejected it long before we reach our ast transformation.

Our CompiledMethod is sent on nil, but we could very well make a specific class for this compile-time evaluation (don’t forget to also change the compilation context). One with variables of interest for conditional compilation for example.

An #someSymbol __ifDef: [ ... ] could be simply desugared as follows:

[ CompileTimeEval class varNames includes: #someSymbol ] __ifTrue: [ ... ]

Another way to expose variables for the evaluation is using the bindings: message on the OpalCompiler.

Finally, there’s also the problem of error reporting which I’ve completely ignored here. Using the compiler’s facilities to report errors more precisely instead of relying on exceptions would be much much better.

Loop Unrolling

We’ll unroll our loops by generating Duff’s Devices in IR.

Pharo’s VM is like a stack machine. It’s a fairly simple way to execute bytecode. The idea is that we have a stack (duh!), and when we want to call a function, we fetch the method’s arguments by popping them from the stack in (reverse order), and then push the method’s return value on the stack.

For example 3 + 2 - (1 + 8):

push 3  ;; stack: 3 ..
push 2  ;; stack: 2 . 3 ..
call +  ;; stack: 5 ..
push 1  ;; stack: 1 . 5 ..
push 8  ;; stack: 8 . 1 . 5 ..
call +  ;; stack: 9 . 5 ..
call -  ;; stack: -4 ..

setup

Again, we have to start by grabbing a compiler and adding our extensions to it. This time we’ll use inheritance to override the default behavior. It’s really not much harder than adding a plugin, it just requires a little more boilerplate (for defining our subclasses).

Getting a custom compiler:

oc := Smalltalk compiler yourself.
oc compilationContext astTranslatorClass: MyOCASTTranlator.
^ oc.

OCASTTranlator is the visitor emitting IR, and that’s the one we’ll need to subclass:

OCASTTranlator subclass: #MyOCASTTranlator
   instanceVariableNames: ''
   classVariableNames: ''
   package: 'blog'

but that won’t be enough, indeed the class responsible for inlining #to:do: is OCASTTranlatorForEffect:

OCASTTranslatorForEffect subclass: #MyOCASTTranslatorForEffect
   instanceVariableNames: ''
   classVariableNames: ''
   package: 'blog'

That’s because while emitToDo (the method we’re interrested in overriding) is defined in OCASTTranslator it will be sent to the instance’s variable effectTranslator. It is the translator used for translating MethodNode’s and our standalone code is compiled as a dummy DoIt method.
The effectTranslator is set in OCASTTranslator>>#initialize with OCASTTranslator>>#classForEffect. The best choice is probably to override the latter.

MyOCASTTranslator>> classForEffect
  ^ MyOCASTTranslatorForEffect

emitting IR

IR generation is done through the IRBuilder class. In my opinion the best way to learn/discover is to simply look at the existing code. Since our loop unrolling will be an override of the emitToDo method, we might as well start there.

OCASTTranslator>> emitToDo: aMessageNode
   | limit block iterator limitEmit |
   "grab the ast's nodes"
   limit := aMessageNode arguments first.
   block := aMessageNode arguments last.
   iterator := block arguments first binding.

   limitEmit := [ valueTranslator visitNode: limit ].
   "to reduce space usage in some cases we don't make a local variable for the limit"
   limit isLiteralNode | limit isSelf | limit isSuper | limit isArgument
      ifFalse: [ valueTranslator visitNode: limit.
         methodBuilder addTemp: iterator name , #limit.
         methodBuilder storeTemp: iterator name , #limit.
         methodBuilder popTop.
         limitEmit := [ methodBuilder pushTemp: iterator name , #limit ] ].
   "limitEmit is a closure that pushes its value on the stack"

   "push start. allocate and initialize iterator"
   valueTranslator visitNode: aMessageNode receiver.
   iterator emitStore: methodBuilder.
   self isEffectTranslator
      ifTrue: [ methodBuilder popTop ].

   methodBuilder jumpBackTarget: #start.
   "loop's predicate"
   iterator emitValue: methodBuilder.
   limitEmit value.
   methodBuilder send: #<=.
   methodBuilder jumpAheadTo: #done if: false.

   "we inline the block's code"
   effectTranslator visitInlinedBlockNode: block.

   "iterator incrementation"
   iterator emitValue: methodBuilder.
   methodBuilder pushLiteral: 1.
   methodBuilder send: #+.
   iterator emitStore: methodBuilder.
   methodBuilder popTop.

   "jump back to the loop's predicate"
   methodBuilder jumpBackTo: #start.
   methodBuilder jumpAheadTarget: #done

In reality emitToDo is a simple wrapper around emitToDo:step: where step = 1 though. So this is the code if it was inlined inside emitToDo.

The only major differences with duff’s device are inlining the block multiple times and the switch-case. Before we can jump to the right case we need to compute the remainder of the loop. So let’s make a small abstraction to emit that remainder:

MyOCASTTranslatorForEffect>> emitRemFrom: startEmit to: limitEmit by: unrollFactor
   limitEmit value.              "push limit on stack"
   startEmit value.              "push start on stack"
   methodBuilder send: #-.       "compute difference"
   methodBuilder pushLiteral: 1. 
   methodBuilder send: #+.       "+1 because #to:do: works as closed interval"
   methodBuilder pushLiteral: unrollFactor.
   methodBuilder send: #%.       "modulo unroll factor"
   methodBuilder addTemp: #remainder.
   methodBuilder storeTemp: #remainder.
   methodBuilder popTop.         "storeTemp keeps the value on the stack"
   ^ [ methodBuilder pushTem: #remainder ]

while we’re at it, let’s also write one for incrementing the iterator to make it easier for us later :

MyOCASTTranslatorForEffect>> emitIterIncrement: iterator
   iterator emitValue: methodBuilder.
   methodBuilder pushLiteral: 1.
   methodBuilder send: #+
   iterator emitStore: methodBuilder.
   methodBuilder popTop

Our switch is composed of two parts: testing the remainder with jumping to the right place and the jump targets (case:) with the inline body.

Generating the first part:

caseLabel := [ :x | #case , x asString asSymbol ] "just concat a number to #case symbol"

0 to: unrollFactor - 1 do: [:i |
   remEmit value.
   methodBuilder pushLiteral: i.
   methodBuilder send: #=.
   methodBuilder jumpAheadTo: (caseLabel value: i) if: true ].
]

For the body and the targets we have to take into account a couple a things. The first is the iterator should be incremented for every inlined body, BUT not for the very first jump! The way we’ll handle this is by emitting the iterator incrementation before the jumpTarget, except for the very first one (case 0). We’ll also increment at the very end, before our loop’s predicate gets re-evaluated.

Secondly, there’s a particular order in which the targets must be emitted. The first target is if there’s no loop remainder: we’ll want to execute every inlined body. For a remainder different than 0 we want to execute the body, during the first iteration, as much times as what remains! So we start with the highest value (unrollFactor - 1) and decrement until 1.

In code we obtain:

methodBuilder jumpAheadTarget: (caseLabel value: 0)
effectTranslator visitInlinedBlockNode: block.
unrollFactor - 1 to: 1 by: -1 do: [: i |
   self emitIterIncrement: iterator.
   methodBuilder jumpAheadTarget: (caseLabel value: i).
   effectTranslator visitInlinedBlockNode: block ].
self emitIterIncrement: iterator.

Keeping a similar behavior as to:do:

There is one thing we haven’t discussed yet: what if start > limit. In the classic to:do: this isn’t a problem, when we enter the code, we test and simply jump to #done. But since we have our switch before the #start we have to explicitly handle this case somehow. There’s really various ways to do it, but I’ll go for the most simple: test and short-circuit before the switch.

One important thing to know though is that the IRBuilder allows for only one jump to a target! Skimming through the IRBuilder code, I conclude that when we generate a jump, we pop the target’s label from a stack. An exception is raised if there’s a jumpAheadTarget without a jumpAhead, so the stack is probably used to check that. It is apparently not the case for jumpBackTargets, but the IR doesn’t support conditional jumpBacks.

Anyways, that’s the reason why we have to introduce a new jumpAheadTarget aside from #done.

Stiching it together

And here’s our entire method:

MyOCASTTranlatorForEffect>> emitToDo: aMessageNode
    | unrollFactor start caseLabel limit block iterator limitEmit remEmit |
    start := aMessageNode receiver.
    limit := aMessageNode arguments first.
    block := aMessageNode arguments last.
    iterator := block arguments first binding.
    unrollFactor := 8.
    caseLabel := [ :n | #case , n asString asSymbol ].

    "limitEmit pushes the limit on the stack "
    limitEmit := [ valueTranslator visitNode: limit ].
    "we avoid spare an allocation in some cases"
    limit isLiteralNode | limit isSelf | limit isSuper | limit isArgument
        ifFalse: [ valueTranslator visitNode: limit.
            methodBuilder addTemp: iterator name , #limit.
            methodBuilder storeTemp: iterator name , #limit.
            methodBuilder popTop.
            limitEmit := [ methodBuilder pushTemp: iterator name , #limit ] ].

    "emit start and iterator"
    valueTranslator visitNode: start.
    iterator emitStore: methodBuilder.
    self isEffectTranslator
        ifTrue: [ methodBuilder popTop ].

    "short circuit if start > limit"
    iterator emitValue: methodBuilder.
    limitEmit value.
    methodBuilder send: #<=.
    methodBuilder jumpAheadTo: #shortexit if: false.

    "compute remainder for first jump target"
    remEmit := self
        emitRemFrom: [ iterator emitValue: methodBuilder ]
        to: limitEmit
        by: unrollFactor.

    "switch conditional jumps"
    0 to: unrollFactor - 1 do: [ :i |
        remEmit value.
        methodBuilder pushLiteral: i.
        methodBuilder send: #=.
        methodBuilder jumpAheadTo: (caseLabel value: i) if: true ].

    "loop predicate"
    methodBuilder jumpBackTarget: #start.
    iterator emitValue: methodBuilder.
    limitEmit value.
    methodBuilder send: #<=.
    methodBuilder jumpAheadTo: #done if: false.

    "first body"
    methodBuilder jumpAheadTarget: (caseLabel value: 0).
    effectTranslator visitInlinedBlockNode: block.
    "targets + inlined bodies"
    unrollFactor - 1 to: 1 by: -1 do: [ :i |
        self emitIterIncrement: iterator.
        methodBuilder jumpAheadTarget: (caseLabel value: i).
        effectTranslator visitInlinedBlockNode: block ].

    self emitIterIncrement: iterator.
    methodBuilder jumpBackTo: #start.

    methodBuilder jumpAheadTarget: #done.
    methodBuilder jumpAheadTarget: #shortexit

Improvements

Here again I have a little bit of a lacking implementation, but it is for simplicity’s sake. Afterall this post is more an amuse-bouche than a real case study.

The First obvious improvement is that we do not necessarily want to unroll every loop, and certainly not with the same unroll factor. The bytecode size of the image would explose otherwise! It’d be much better to let the caller control the unroll factor (after benchmarking for example). We could use a different message (e.g. to:unrollBy:do:) and keep to:do: untouched.
While the implementation wouldn’t be much more complicated than the previous one (just remember that the unroll factor must be evaluated at compile-time), we have to remember pharo’s semantics: every “method call” is a message sent to an object! That includes to:do:.
So how come we generate specific code for to:do: instead of a generic message send? Well for performance reasons some messages get inlined. That’s what we’ve overridden, the IR generation for an inlined to:do:. This also means that simply inheritting the OCASTTranslator won’t be enough to support our special message for unrolling! Unfortunately the decision whether a message should be inlined or not isn’t as decoupled from the compiler as what we’ve seen so far.
First, the decision whether to inline a message or not depends on the RBMessageNode>>#isInline method, which itself relies on RBMessageNode>>#isInlineToDo. So we’d have to intervene in there, which wouldn’t be trivial with inheritance, as this implies changing the AST itself. The easiest would probably be to override OCASTTranslator>>#visitMessageNode: and make a special case for our unrolling selector.
Finally, as we can see in OCASTTranslator>>#visitMessageNode, the method for inlining (e.g. emitToDo) is fetch form a Dictionnary, that maps message selectors with their emitMethod. This dictionary is stored in a class variable of OCASTTranslator and would need to be extended so that we can emit our unrolled loop for our custom message.

Going much farther

The metaprogramming facilities of Pharo are really extraordinary! There’s much much more that is possible.
But I wanted to show how elegant the model is: every step of the compiler is reified in Pharo’s very simple model (everything is an object, even the compiler). Sure, being able to grab the compiler, extend it, and modify existing objects in the system might not be the most useful for developping your CRUD app… But I hope I was able to show how relatively simple it is to use one of those facilities: the compiler itself.