I learned Pharo at univeristy for a metaprogamming class. I believe it to be a very important milestone in my programming education. On paper, Pharo was everything I hated in a programming language: OO, dynamicly typed, interpreted, very small flexibility wrt tooling (can’t use your own version control, text editor/IDE, console, etc.). But this ended up being my favourite class, and I have fond memories of the nights I spent writing pharo.
The language made me realize that the quality of a programming language isn’t mainly defined by its features (although I’m still a big proponent of statically typed languages); rather a principled and consistent approach to the language design is way more important in my opinion!
One of the cool things of Smalltalk is that everything is an object. Really everything, not Java-tier everything. Classes are objects, metaclasses are objects, the compiler itself is an object!
In pharo the frontier between compiletime and runtime is kinda blurry, and it allows us to do really cool things, such as recompiling, modifying, creating, removing methods or classes.
This also brings us quite a cool TDD experience: just write your tests, run them before you wrote the code. Exceptions should be raised for Objects, classes or messages that haven’t been implemented. You click on the create button, implement it, click continue and repeat.
And this isn’t some magical feature, Pharo exposes all the facilities for us to implement this ourselves if we wanted. Almost everything in the language is reifiable (including the stack itself)!
Of course, business applications don’t really need this level of reflexivity. And that’s probably why there aren’t a lot of articles or talks about those features. The size of a community also plays a role and it doesn’t seem to be as interrested in PLT as others either.
Nonetheless, I decided to make this post where I go over how to implement two features: conditional compilation (similar to CPP’s #if
, #else
) and loop unrolling. The first is as an AST transformation and the second by generating IR.
Conditional Compilation
For simplicity we’ll only support an #__ifTrue:
message (underscore is the only symbol allowed in keyword message identifiers). Supporting #__ifFalse:
and #__ifTrue:ifFalse:
is trivial. We’ll also only support blocks as receiver.
A quick example of usage:
[ Debugging ] __ifTrue: [ GrowlMorph openWithLabel: 'Debug' contents: 'foo= ' , foo asString. ]
the opal compiler
The whole compilation process of the Opal compiler (pharo’s compiler) is reified, the four big passes are: 1. Parsing which produces an AST
see RBProgramNode
in AST-Core
package 2. Semantic analysis which adds information to the AST (such as scoping)
see OCASTSemanticAnalyzer
in OpalCompiler-Core
3. IR generation to build an Intermediate Representation
see OCASTTranlator
and IR-Nodes
in OpalCompiler-Core
4. Bytecode generation
see IRTranslator
Every one of these transformation is implemented as a visitor (IRVisitorV2 and RB*NodeVisitor). Since compile time conditionals are supposed to determine what is to be compiled, we’ll work on the AST.
The OpalCompiler has a plugin system which allow us to transform the AST before IR generation and after semantic analysis. That’s how we’ll implement our feature.
A simple way to grab a compiler with sensible defaults for extension is with Smalltalk compiler
. Although every class supports the compiler message (defined in Behavior
).
oc := Smalltalk compiler
noPattern: true; "set noPattern if you plan on compiling code snippets instead of methods"
addPlugin: SomePlugin;
yourself.
Use the compile:
or evaluate:
messages (each taking a string as argument) to compile and evaluate code.
compile:
returns a CompiledMethod
if noPattern
was set it will be named #DoIt
. The CompiledMethod
holds the AST, the IR and the bytecode, useful for debugging!
Executing a CompiledMethod
is done with the valueWithReceiver:arguments:
message. This is because every method is a message to an object’s instance, therefore to pass the receiver as an argument. arguments:
expects an Array
.
This means it is possible to use class and instance variables (of the receiver) in the code we pass to compile:
. Either grab the class’s compiler (ClassName compiler
) or set it in the CompilationContext
.
For example, with Association
which has a key
and value
instance variable:
oc := Smalltalk compiler noPattern: true; yourself.
oc compilationContext class: Association.
(oc compile 'value + 2') valueWithReceiver: (#key -> 3) arguments: #(). "5"
API discovery is really easy in Pharo, and I really encourage to browse the different classes and packages mentionned. Don’t forget to use the spotter (Shift + Enter
) which is an invaluable tool!
transforming the AST
We start by creating our plugin:
OCCompilerASTPlugin subclass: #MyCompilerASTPlugin
instanceVariableNames: ''
classVariableNames: ''
package: 'Blog'
The OCCompilerASTPlugin
is an abstract class which requires two methods to be overriden. transform
on the instance side, and priority
on the class side. priority
should return a number between 0 and 100. We’ll return 100
so our transformation gets the highest priority.
The transformation isn’t very hard because the AST has a very easy API to use.
MyCompilerASTPlugin>> transform
| nodes |
"we grab all the nodes which are a `#__ifTrue` message sent to a block"
nodes := ast copy sendNodes
select: [ :n |
n receiver isBlock
and: (n selector = #__ifTrue)].
nodes
do: [ :n |
| pred br |
"we must first evaluate the predicate"
pred := self eval: n receiver.
"simple error handling in case the predicate doesn't evaluate to a boolean"
pred class superclass = Boolean
ifFalse: [ Error signal: 'predicate not a Boolean' ]
"the first argument is the code we wanted to conditionnally compile
we don't want to compile the block itself, but its body (a sequence)"
br = pred
ifTrue: [ (n argument at: 1) body ]
ifFalse: [ RBSequenceNode statements: {} ] "empty sequence otherwise"
n replaceWith: br ].
The transform method is pretty straightforward: we get a list of all the send nodes, filter on message selector (#__ifTrue:
) and receiver (Block), evaluate the receiver (our predicate), and replace the entire node with either the branch or and empty sequence.
compile-time evaluation
I simply abstracted the evaluation to keep transform
more readable.
MyCompilerASTPlugin>> eval: aRBBlockNode
| body mast cm |
body := aRBBlockNode copy body
addReturn;
yourself.
mast := (RBMethodNode selector: #DoIt body: body) doSemanticAnalysis.
(mast variableNodes allSatisfy: [:n | n isGlobal or: n isTemp ])
ifFalse: [ Error signal: 'cannot resolve at compile time' ]
^ mast generateWithSource valueWithReceiver: nil arguments: #()
We could just pretty print the ast, and then like mentionned earlier evaluate:
it. But It’d be a waste to output code just for it to be parsed again. So why not just transform our block’s body like evaluate
would: wrap our block’s body into a DoIt
method and return the last statement. addReturn
does the latter, for the former we just create an RBMethodNode
.
There’s one catch though: We absolutely have to do another semantic analysis pass over our block!
Consider the following code:
foo
|x|
"..."
[ (x + 2) = 4 ] __ifTrue: [ 'foo' ]
"..."
x
in foo is a temporary variable, but that’s only true inside the method foo! If we take the block’s body on its own (x + 2) = 4
, x now has a larger scope (maybe global or instance/class) but the SemanticAnalyzer. annoted it while parsing the entire method foo
. Since we’re going to evaluate the block while compiling the method, we cannot depend on runtime values (such as temporaries with a scope larger than the block itself). That’s why we have to do a new semantic analysis pass.
improvements
Our conditional compilation is pretty barebones.
All in all, it’s probably better to not implement this sort of feature in a stage as late as the AST transformation pass. In languages that support this sort of feature, it happens before lexing or
parsing. Although we want to do conditional compilation, we have to be as syntactically correct as normal keyword message pass. Code like a [ cond ] __ifTrue [ + ] __ifFalse [ - ] b
aren’t possible because the parser would rejected it long before we reach our ast transformation.
Our CompiledMethod is sent on nil
, but we could very well make a specific class for this compile-time evaluation (don’t forget to also change the compilation context). One with variables of interest for conditional compilation for example.
An #someSymbol __ifDef: [ ... ]
could be simply desugared as follows:
[ CompileTimeEval class varNames includes: #someSymbol ] __ifTrue: [ ... ]
Another way to expose variables for the evaluation is using the bindings:
message on the OpalCompiler
.
Finally, there’s also the problem of error reporting which I’ve completely ignored here. Using the compiler’s facilities to report errors more precisely instead of relying on exceptions would be much much better.
Loop Unrolling
We’ll unroll our loops by generating Duff’s Devices in IR.
Pharo’s VM is like a stack machine. It’s a fairly simple way to execute bytecode. The idea is that we have a stack (duh!), and when we want to call a function, we fetch the method’s arguments by popping them from the stack in (reverse order), and then push the method’s return value on the stack.
For example 3 + 2 - (1 + 8)
:
push 3 ;; stack: 3 ..
push 2 ;; stack: 2 . 3 ..
call + ;; stack: 5 ..
push 1 ;; stack: 1 . 5 ..
push 8 ;; stack: 8 . 1 . 5 ..
call + ;; stack: 9 . 5 ..
call - ;; stack: -4 ..
setup
Again, we have to start by grabbing a compiler and adding our extensions to it. This time we’ll use inheritance to override the default behavior. It’s really not much harder than adding a plugin, it just requires a little more boilerplate (for defining our subclasses).
Getting a custom compiler:
oc := Smalltalk compiler yourself.
oc compilationContext astTranslatorClass: MyOCASTTranlator.
^ oc.
OCASTTranlator
is the visitor emitting IR, and that’s the one we’ll need to subclass:
OCASTTranlator subclass: #MyOCASTTranlator
instanceVariableNames: ''
classVariableNames: ''
package: 'blog'
but that won’t be enough, indeed the class responsible for inlining #to:do:
is OCASTTranlatorForEffect
:
OCASTTranslatorForEffect subclass: #MyOCASTTranslatorForEffect
instanceVariableNames: ''
classVariableNames: ''
package: 'blog'
That’s because while emitToDo
(the method we’re interrested in overriding) is defined in OCASTTranslator
it will be sent to the instance’s variable effectTranslator
. It is the translator used for translating MethodNode’s and our standalone code is compiled as a dummy DoIt
method.
The effectTranslator
is set in OCASTTranslator>>#initialize
with OCASTTranslator>>#classForEffect
. The best choice is probably to override the latter.
MyOCASTTranslator>> classForEffect
^ MyOCASTTranslatorForEffect
emitting IR
IR generation is done through the IRBuilder
class. In my opinion the best way to learn/discover is to simply look at the existing code. Since our loop unrolling will be an override of the emitToDo
method, we might as well start there.
OCASTTranslator>> emitToDo: aMessageNode
| limit block iterator limitEmit |
"grab the ast's nodes"
limit := aMessageNode arguments first.
block := aMessageNode arguments last.
iterator := block arguments first binding.
limitEmit := [ valueTranslator visitNode: limit ].
"to reduce space usage in some cases we don't make a local variable for the limit"
limit isLiteralNode | limit isSelf | limit isSuper | limit isArgument
ifFalse: [ valueTranslator visitNode: limit.
methodBuilder addTemp: iterator name , #limit.
methodBuilder storeTemp: iterator name , #limit.
methodBuilder popTop.
limitEmit := [ methodBuilder pushTemp: iterator name , #limit ] ].
"limitEmit is a closure that pushes its value on the stack"
"push start. allocate and initialize iterator"
valueTranslator visitNode: aMessageNode receiver.
iterator emitStore: methodBuilder.
self isEffectTranslator
ifTrue: [ methodBuilder popTop ].
methodBuilder jumpBackTarget: #start.
"loop's predicate"
iterator emitValue: methodBuilder.
limitEmit value.
methodBuilder send: #<=.
methodBuilder jumpAheadTo: #done if: false.
"we inline the block's code"
effectTranslator visitInlinedBlockNode: block.
"iterator incrementation"
iterator emitValue: methodBuilder.
methodBuilder pushLiteral: 1.
methodBuilder send: #+.
iterator emitStore: methodBuilder.
methodBuilder popTop.
"jump back to the loop's predicate"
methodBuilder jumpBackTo: #start.
methodBuilder jumpAheadTarget: #done
In reality emitToDo
is a simple wrapper around emitToDo:step:
where step = 1 though. So this is the code if it was inlined inside emitToDo
.
The only major differences with duff’s device are inlining the block multiple times and the switch-case. Before we can jump to the right case we need to compute the remainder of the loop. So let’s make a small abstraction to emit that remainder:
MyOCASTTranslatorForEffect>> emitRemFrom: startEmit to: limitEmit by: unrollFactor
limitEmit value. "push limit on stack"
startEmit value. "push start on stack"
methodBuilder send: #-. "compute difference"
methodBuilder pushLiteral: 1.
methodBuilder send: #+. "+1 because #to:do: works as closed interval"
methodBuilder pushLiteral: unrollFactor.
methodBuilder send: #%. "modulo unroll factor"
methodBuilder addTemp: #remainder.
methodBuilder storeTemp: #remainder.
methodBuilder popTop. "storeTemp keeps the value on the stack"
^ [ methodBuilder pushTem: #remainder ]
while we’re at it, let’s also write one for incrementing the iterator to make it easier for us later :
MyOCASTTranslatorForEffect>> emitIterIncrement: iterator
iterator emitValue: methodBuilder.
methodBuilder pushLiteral: 1.
methodBuilder send: #+
iterator emitStore: methodBuilder.
methodBuilder popTop
Our switch is composed of two parts: testing the remainder with jumping to the right place and the jump targets (case:
) with the inline body.
Generating the first part:
caseLabel := [ :x | #case , x asString asSymbol ] "just concat a number to #case symbol"
0 to: unrollFactor - 1 do: [:i |
remEmit value.
methodBuilder pushLiteral: i.
methodBuilder send: #=.
methodBuilder jumpAheadTo: (caseLabel value: i) if: true ].
]
For the body and the targets we have to take into account a couple a things. The first is the iterator should be incremented for every inlined body, BUT not for the very first jump! The way we’ll handle this is by emitting the iterator incrementation before the jumpTarget, except for the very first one (case 0). We’ll also increment at the very end, before our loop’s predicate gets re-evaluated.
Secondly, there’s a particular order in which the targets must be emitted. The first target is if there’s no loop remainder: we’ll want to execute every inlined body. For a remainder different than 0 we want to execute the body, during the first iteration, as much times as what remains! So we start with the highest value (unrollFactor - 1) and decrement until 1.
In code we obtain:
methodBuilder jumpAheadTarget: (caseLabel value: 0)
effectTranslator visitInlinedBlockNode: block.
unrollFactor - 1 to: 1 by: -1 do: [: i |
self emitIterIncrement: iterator.
methodBuilder jumpAheadTarget: (caseLabel value: i).
effectTranslator visitInlinedBlockNode: block ].
self emitIterIncrement: iterator.
Keeping a similar behavior as to:do:
There is one thing we haven’t discussed yet: what if start > limit
. In the classic to:do:
this isn’t a problem, when we enter the code, we test and simply jump to #done
. But since we have our switch before the #start
we have to explicitly handle this case somehow. There’s really various ways to do it, but I’ll go for the most simple: test and short-circuit before the switch.
One important thing to know though is that the IRBuilder allows for only one jump to a target! Skimming through the IRBuilder
code, I conclude that when we generate a jump, we pop the target’s label from a stack. An exception is raised if there’s a jumpAheadTarget
without a jumpAhead
, so the stack is probably used to check that. It is apparently not the case for jumpBackTargets
, but the IR doesn’t support conditional jumpBack
s.
Anyways, that’s the reason why we have to introduce a new jumpAheadTarget
aside from #done
.
Stiching it together
And here’s our entire method:
MyOCASTTranlatorForEffect>> emitToDo: aMessageNode
| unrollFactor start caseLabel limit block iterator limitEmit remEmit |
start := aMessageNode receiver.
limit := aMessageNode arguments first.
block := aMessageNode arguments last.
iterator := block arguments first binding.
unrollFactor := 8.
caseLabel := [ :n | #case , n asString asSymbol ].
"limitEmit pushes the limit on the stack "
limitEmit := [ valueTranslator visitNode: limit ].
"we avoid spare an allocation in some cases"
limit isLiteralNode | limit isSelf | limit isSuper | limit isArgument
ifFalse: [ valueTranslator visitNode: limit.
methodBuilder addTemp: iterator name , #limit.
methodBuilder storeTemp: iterator name , #limit.
methodBuilder popTop.
limitEmit := [ methodBuilder pushTemp: iterator name , #limit ] ].
"emit start and iterator"
valueTranslator visitNode: start.
iterator emitStore: methodBuilder.
self isEffectTranslator
ifTrue: [ methodBuilder popTop ].
"short circuit if start > limit"
iterator emitValue: methodBuilder.
limitEmit value.
methodBuilder send: #<=.
methodBuilder jumpAheadTo: #shortexit if: false.
"compute remainder for first jump target"
remEmit := self
emitRemFrom: [ iterator emitValue: methodBuilder ]
to: limitEmit
by: unrollFactor.
"switch conditional jumps"
0 to: unrollFactor - 1 do: [ :i |
remEmit value.
methodBuilder pushLiteral: i.
methodBuilder send: #=.
methodBuilder jumpAheadTo: (caseLabel value: i) if: true ].
"loop predicate"
methodBuilder jumpBackTarget: #start.
iterator emitValue: methodBuilder.
limitEmit value.
methodBuilder send: #<=.
methodBuilder jumpAheadTo: #done if: false.
"first body"
methodBuilder jumpAheadTarget: (caseLabel value: 0).
effectTranslator visitInlinedBlockNode: block.
"targets + inlined bodies"
unrollFactor - 1 to: 1 by: -1 do: [ :i |
self emitIterIncrement: iterator.
methodBuilder jumpAheadTarget: (caseLabel value: i).
effectTranslator visitInlinedBlockNode: block ].
self emitIterIncrement: iterator.
methodBuilder jumpBackTo: #start.
methodBuilder jumpAheadTarget: #done.
methodBuilder jumpAheadTarget: #shortexit
Improvements
Here again I have a little bit of a lacking implementation, but it is for simplicity’s sake. Afterall this post is more an amuse-bouche than a real case study.
The First obvious improvement is that we do not necessarily want to unroll every loop, and certainly not with the same unroll factor. The bytecode size of the image would explose otherwise! It’d be much better to let the caller control the unroll factor (after benchmarking for example). We could use a different message (e.g. to:unrollBy:do:
) and keep to:do:
untouched.
While the implementation wouldn’t be much more complicated than the previous one (just remember that the unroll factor must be evaluated at compile-time), we have to remember pharo’s semantics: every “method call” is a message sent to an object! That includes to:do:
.
So how come we generate specific code for to:do:
instead of a generic message send? Well for performance reasons some messages get inlined. That’s what we’ve overridden, the IR generation for an inlined to:do:
. This also means that simply inheritting the OCASTTranslator
won’t be enough to support our special message for unrolling! Unfortunately the decision whether a message should be inlined or not isn’t as decoupled from the compiler as what we’ve seen so far.
First, the decision whether to inline a message or not depends on the RBMessageNode>>#isInline
method, which itself relies on RBMessageNode>>#isInlineToDo
. So we’d have to intervene in there, which wouldn’t be trivial with inheritance, as this implies changing the AST itself. The easiest would probably be to override OCASTTranslator>>#visitMessageNode:
and make a special case for our unrolling selector.
Finally, as we can see in OCASTTranslator>>#visitMessageNode
, the method for inlining (e.g. emitToDo
) is fetch form a Dictionnary, that maps message selectors with their emitMethod. This dictionary is stored in a class variable of OCASTTranslator
and would need to be extended so that we can emit our unrolled loop for our custom message.
Going much farther
The metaprogramming facilities of Pharo are really extraordinary! There’s much much more that is possible.
But I wanted to show how elegant the model is: every step of the compiler is reified in Pharo’s very simple model (everything is an object, even the compiler). Sure, being able to grab the compiler, extend it, and modify existing objects in the system might not be the most useful for developping your CRUD app… But I hope I was able to show how relatively simple it is to use one of those facilities: the compiler itself.