Implementing "For each" in Bytecode

mgcadmin 02-07-2026

While implementing Owlscript I didnt want to crowd the grammar by having two different types of control flow which only differed by their syntax. This is the case in C-like languages were for loops serve more as syntactic sugaring of the while loop construct. Simultaneously, I've grown quite fond of the the "enhanced for loops" (foreach) found in modern C++ and Java and wanted a similar construct for iterating over lists and strings in owlscript.

I decided to take the "best of both" approach by implementing "regular" while loops which are familiar to any programmer for normal looping tasks:

let counter := 0;
while (counter < 10) {
       println counter;
       counter++;
}

And additionally adding a foreach loop to allow for quickly iterating over the built-in list and string types:

for (i in [ 1 .. 5 ]) {
    println i+i;
}

While syntactically simple in appearance, there is actually alot going on "behing the scenes" in the truest meaning of syntactic sugaring. Todays post is going to talk about the process of encapsulating the traversal mechanics of owlscripts built-in list type and how it is translated to bytecode by the compiler.

A Quick Recap of Looping

As a control structure loops are made up of two main parts: The test expression and the loop body. The test expression is used to determine If the loop body should be executed. The last statement of the loop body is an unconditional jump back to the test expression. If the test expression fails, execution jumps to the first statement after the loop body.

1) Evaluate the test expression

2) If the test expression evaluates to false, we jump to the first statement after the loop body, exiting the loop.

3) if the result of the test expression is true, execute the statements in the loops body

4) jump to the test Expression

5) repeat from step one.

Encapsulating Iteration

Iteration is the process of visiting every element of a collection in an order determined by the collection type. In order to perform this traversal we need the ability to know a) what our current position in the collection is, b) what the next position in the collection is and c) what the last position in the sequence is. With these three pieces of information, we can traverse any sequence using the following idom:

let itr = list.first();
while (itr != list.end()) {
      process(itr.get());
      itr.next();
}

By separating the traversal from the container, It doesn't matter if the underlying implementation is an array or a linked list, the same strategy holds:

int a[] = {1,2,3,4,5};
int i = 0, len = 5;
while (i < len) {
     printf("%d\n", a[i]);
     i++;
}

node* it = head;
while (it != NULL) {
      printf("%d\n", it->info);
      it = it->next;
}

This is the idea underlying the iterator pattern.

Initialization

We initialize an iterator object - in the array case, an integer to index into the array, for link lists, a pointer to a node.

Test for Completion

the test expression is a bounds check on the iterator objects current possition.

Processing Elements

If the current possition is valid, we process the current position accessed by the iterator object.

Traversal

Finally we move to the next position in the sequence before executing our test expression again.

The Game Plan

It helps to think of how we would implement it using only the constructs currently available to us in owlscript to help guide our instruction selection. In essence, we our compiler to read in this block of code:

for (i of [1 .. 5]) {
    println i;
}

and output the bytecode as if had read this block of code:

{
    let idx := 0;
    let list := [1 .. 5];
    while (idx < list.size()) {
              let i := list[idx];
              println i;
              idx++;
    }
}

Translating to Bytecode

As can be seen from the transliteration above, each foreach loop takes place in its own block scope, and so the first instructions performed are those to open a new scope.

void ByteCodeGenerator::emitForeach(astnode* n) {
    //foreach loops take place in there own block scope
    emit(Instruction(entblk));
    symTable.openFunctionScope(n->token.getString(), -1);

Once in the new scope we lookup the addresses allocated during the resolution phase for the "hidden" variables for the index counter and list alias.

   int IDX = symTable.lookup("itr").addr;    
    int SEQ = symTable.lookup("clti").addr;

    // set up Iterator object
    astnode* itexpr = n->left;
    emitIterator(itexpr, IDX, SEQ);

void ByteCodeGenerator::emitIterator(astnode* n, int IDX, int SEQ) {
    //sets current index to 0
    emit(Instruction(ldconst, StackItem(0.0)));
    emit(Instruction(ldaddr, IDX));
    emit(Instruction(stlocal));
    //evaluate list expression and assign  the result a temp name
    //this way if passed a list constructor the list only gets built once.
    //as we use this temporary name to refer to the list moving forward.
    genExpression(n->right, false);
    emit(Instruction(ldaddr, SEQ));
    emit(Instruction(stlocal));
}

// loop test expr
    int P1 = skipEmit(0);
    emit(Instruction(ldlocal, IDX)); //current index into list
    emit(Instruction(ldlocal, SEQ)); //current list to iterate
    emit(Instruction(list_len));                              // obtain its length
    emit(Instruction(binop, VM_LT));                          //more to go?

The start of loop body, at the beginning of each iteration we push the value at the current index of the list being iterated on to the stack it's value is then stored by the name supplied by user for iterator object.

    int L1 = skipEmit(0); 
    skipEmit(1);
    emit(Instruction(ldlocal, SEQ)); //current list were iterating
    emit(Instruction(ldlocal, IDX));  // index of current position
    emit(Instruction(ldidx));                                 // get data at that index
    emit(Instruction(ldaddr, symTable.lookup(itexpr->left->token.getString()).addr)); //address of runtime iterator
    emit(Instruction(stlocal));                                                       //store data to iterator

The user supplied loop body gets squeezed in here after making sure that our hidden variables resolve.

 //whatever code user wants to perform 
    genStatement(n->right, false);

After processing the user code, we move the iterator one place forward and jump back to the test expression for the next iteration.

    //get us ready for next run through
    emit(Instruction(ldlocal, IDX)); //get value of current index
    emit(Instruction(incr));        //increment it by 1
    emit(Instruction(ldaddr, IDX));  //save new index
    emit(Instruction(stlocal));
    emit(Instruction(jump, P1));  //jump back up to test expression

We need to backpatch the branch-false instruction which dictartes whether to fall through the loop body or not. Now that we know the code address of the last statement in the loop body, we can set the "on-false" condition to jump to that address.

    //backpatch test-expression failure branch
    int L2 = skipEmit(0);
    skipTo(L1);
    emit(Instruction(brf, L2));
    restore();

Finally, when we exit the loop we need to close the scope we opened for it.

    //close scope we opened for loop
    emit(Instruction(retblk));
    symTable.closeScope();
}

MaxGCoding.com

For the love of programming

Implementing "For each" in Bytecode

A Quick Recap of Looping

Encapsulating Iteration

Initialization

Test for Completion

Processing Elements

Traversal

The Game Plan

Translating to Bytecode

Leave A Comment

Implementing "For each" in Bytecode

Top-Down Deletion for Red/Black Trees

Implementing Closures in Bytecode VMs: Heap Allocated Activation Records & Access Links

Pascal & Bernoulli & Floyd: Triangles

A Quick tour of MGCLex

Compiling Regular Expressions for "The VM Approach"

Composable Linked Digraphs: An efficient NFA Data Structure for Thompsons Construction

Improving the Space Efficiency of Suffix Arrays

Augmenting B+ Trees For Order Statistics

Top-Down AST Construction of Regular Expressions with Recursive Descent