Implementing "For each" in Bytecode
While implementing Owlscript I didnt want to crowd the grammar by having two different types of control flow which only differed by their syntax. This is the case in C-like languages were for loops serve more as syntactic sugaring of the while loop construct. Simultaneously, I've grown quite fond of the the "enhanced for loops" (foreach) found in modern C++ and Java and wanted a similar construct for iterating over lists and strings in owlscript.
I decided to take the "best of both" approach by implementing "regular" while loops which are familiar to any programmer for normal looping tasks:
let counter := 0;
while (counter < 10) {
println counter;
counter++;
}
And additionally adding a foreach loop to allow for quickly iterating over the built-in list and string types:
for (i in [ 1 .. 5 ]) {
println i+i;
}
While syntactically simple in appearance, there is actually alot going on "behing the scenes" in the truest meaning of syntactic sugaring. Todays post is going to talk about the process of encapsulating the traversal mechanics of owlscripts built-in list type and how it is translated to bytecode by the compiler.
A Quick Recap of Looping
As a control structure loops are made up of two main parts: The test expression and the loop body. The test expression is used to determine If the loop body should be executed. The last statement of the loop body is an unconditional jump back to the test expression. If the test expression fails, execution jumps to the first statement after the loop body.
1) Evaluate the test expression
2) If the test expression evaluates to false, we jump to the first statement after the loop body, exiting the loop.
3) if the result of the test expression is true, execute the statements in the loops body
4) jump to the test Expression
5) repeat from step one.
Encapsulating Iteration
Iteration is the process of visiting every element of a collection in an order determined by the collection type. In order to perform this traversal we need the ability to know a) what our current position in the collection is, b) what the next position in the collection is and c) what the last position in the sequence is. With these three pieces of information, we can traverse any sequence using the following idom:
let itr = list.first();
while (itr != list.end()) {
process(itr.get());
itr.next();
}
By separating the traversal from the container, It doesn't matter if the underlying implementation is an array or a linked list, the same strategy holds:
int a[] = {1,2,3,4,5};
int i = 0, len = 5;
while (i < len) {
printf("%d\n", a[i]);
i++;
}
node* it = head;
while (it != NULL) {
printf("%d\n", it->info);
it = it->next;
}
This is the idea underlying the iterator pattern.
Initialization
We initialize an iterator object - in the array case, an integer to index into the array, for link lists, a pointer to a node.
Test for Completion
the test expression is a bounds check on the iterator objects current possition.
Processing Elements
If the current possition is valid, we process the current position accessed by the iterator object.
Traversal
Finally we move to the next position in the sequence before executing our test expression again.
The Game Plan
It helps to think of how we would implement it using only the constructs currently available to us in owlscript to help guide our instruction selection. In essence, we our compiler to read in this block of code:
for (i of [1 .. 5]) {
println i;
}
and output the bytecode as if had read this block of code:
{
let idx := 0;
let list := [1 .. 5];
while (idx < list.size()) {
let i := list[idx];
println i;
idx++;
}
}
Translating to Bytecode
As can be seen from the transliteration above, each foreach loop takes place in its own block scope, and so the first instructions performed are those to open a new scope.
void ByteCodeGenerator::emitForeach(astnode* n) {
//foreach loops take place in there own block scope
emit(Instruction(entblk));
symTable.openFunctionScope(n->token.getString(), -1);
Once in the new scope we lookup the addresses allocated during the resolution phase for the "hidden" variables for the index counter and list alias.
int IDX = symTable.lookup("itr").addr;
int SEQ = symTable.lookup("clti").addr;
// set up Iterator object
astnode* itexpr = n->left;
emitIterator(itexpr, IDX, SEQ);
void ByteCodeGenerator::emitIterator(astnode* n, int IDX, int SEQ) {
//sets current index to 0
emit(Instruction(ldconst, StackItem(0.0)));
emit(Instruction(ldaddr, IDX));
emit(Instruction(stlocal));
//evaluate list expression and assign the result a temp name
//this way if passed a list constructor the list only gets built once.
//as we use this temporary name to refer to the list moving forward.
genExpression(n->right, false);
emit(Instruction(ldaddr, SEQ));
emit(Instruction(stlocal));
}
// loop test expr
int P1 = skipEmit(0);
emit(Instruction(ldlocal, IDX)); //current index into list
emit(Instruction(ldlocal, SEQ)); //current list to iterate
emit(Instruction(list_len)); // obtain its length
emit(Instruction(binop, VM_LT)); //more to go?
The start of loop body, at the beginning of each iteration we push the value at the current index of the list being iterated on to the stack it's value is then stored by the name supplied by user for iterator object.
int L1 = skipEmit(0);
skipEmit(1);
emit(Instruction(ldlocal, SEQ)); //current list were iterating
emit(Instruction(ldlocal, IDX)); // index of current position
emit(Instruction(ldidx)); // get data at that index
emit(Instruction(ldaddr, symTable.lookup(itexpr->left->token.getString()).addr)); //address of runtime iterator
emit(Instruction(stlocal)); //store data to iterator
The user supplied loop body gets squeezed in here after making sure that our hidden variables resolve.
//whatever code user wants to perform
genStatement(n->right, false);
After processing the user code, we move the iterator one place forward and jump back to the test expression for the next iteration.
//get us ready for next run through
emit(Instruction(ldlocal, IDX)); //get value of current index
emit(Instruction(incr)); //increment it by 1
emit(Instruction(ldaddr, IDX)); //save new index
emit(Instruction(stlocal));
emit(Instruction(jump, P1)); //jump back up to test expression
We need to backpatch the branch-false instruction which dictartes whether to fall through the loop body or not. Now that we know the code address of the last statement in the loop body, we can set the "on-false" condition to jump to that address.
//backpatch test-expression failure branch
int L2 = skipEmit(0);
skipTo(L1);
emit(Instruction(brf, L2));
restore();
Finally, when we exit the loop we need to close the scope we opened for it.
//close scope we opened for loop
emit(Instruction(retblk));
symTable.closeScope();
}
-
Implementing "For each" in Bytecode
-
Top-Down Deletion for Red/Black Trees
-
Implementing Closures in Bytecode VMs: Heap Allocated Activation Records & Access Links
-
Pascal & Bernoulli & Floyd: Triangles
-
A Quick tour of MGCLex
-
Compiling Regular Expressions for "The VM Approach"
-
Composable Linked Digraphs: An efficient NFA Data Structure for Thompsons Construction
-
Improving the Space Efficiency of Suffix Arrays
-
Augmenting B+ Trees For Order Statistics
-
Top-Down AST Construction of Regular Expressions with Recursive Descent
Leave A Comment