Access Links, Activation Records, And Bytecode: Implementing Lexical Scoping & Closures

Virtually every modern programming language capable of writing non-trivial programs is expected to support Lexical scoping by default. Some legacy programming languages, and even a few modern scripting languages still make use of dynamic scoping (Perl 5, awk), but it is becoming increasingly rare - at least as the default strategy and certainly as the only type. 

Lexical scoping enforces lifetime visibility of declared variables based on their lexicographic location of definition. Essentially, a variables lifetime and visibility is based upon where it first appears in the code, and if it has been shadowed in another scope or not, or it's defining scope has closed. This makes it much easier to reason about what the output of code should be just by reading it - something impossible, or at least extremely difficult to do in dynamically scoped environments. Importantly, this information can be determined statically at compile time which allow us to make certain assumptions about the layout of the call stack at run time. 

In todays post I'm going to cover some of my experience with implementing Lexical Scoping in a bytecode interpreter, a topic which I have found very little information in the way of implementation details. This one goes off in the weeds a bit, so maybe grab something to drink. Anyway, lets get to it.

The Different Flavors of Lexical Scoping & The FunArg Problem

Grab any text book on compilers and open to the section on run time environments and symbol tables (interestingly, it happens to be chapter 7 in a startling number of books on the subject.) and you will encounter a discussion of Lexical Scoping and how it influences the Activation Records and thus the entire calling sequence when it comes to functions and procedures.There are three main "flavors" of lexical scoping, each building upong the functionality inherhent in the previous, each requiring modification to the structure of the call frame - and possibly the entire stack itsself.

The simplest form of lexical scoping is "Lexical Scoping with No Nested Procedure Definitions" - this is the type found in C, where all procedures/functions are declared at the global level. We then move to "Lexical Scoping with Nested Procedures Definitions", which as the name implies allow for procedures and functions which can be defined within the local scope of other procedures/functions. This can then be further broken down into two subtypes:

  1. Nested procedures which cannot out live their enclosing scopes lifetime, as can be found in Pascal.
  2. Nested procedures which can out live their enclosing scope by forming closures, as in Scheme, Lua, etc.

The difference between the two ultimately comes down to how, and conseqeuntly to what degree they address the "FunArg problem". The funarg problem addresses the difficulty of implementing first class functions which is also furter broken down into how they handle the passing of functions as arguments to procedures, termed "the downward funarg problem" and the returning of functions as the result of other functions, deemed "the upward funarg problem". 

The Call Stack & First Class Functions

Languages like Pascal and Ada which fall into the "type a" category can handle the downward funarg problem, but not the upward. They do this by adding a pointer to their activation records called an "access link" or "static link" which points to the activation record of their defining environment, while the "control link" or "dynamic link" points to it's calling environment. 

Template

Any Language supporting recursion, regardless of its scoping strategy will maintain a stack of call frames, connected by the dynamic link, the management of which is dead simple: it points to the frame directly below it. It is the static link which is a bit trickier to sort out, and is the focus of the majority of the rest of the post. Languages with support for full lexical closures need a different kind of stack for the activation record to survive the defining procedure going out of scope. This generally requires heap allocated activation records in the form of a cactus stack. One downside of this is that the pushing and popping scopes from the stack is now more involved than updating an index register, as we maintain a tree of stacks instead of a single unified call stack.

undefined  A Cactus stack is a parent pointer tree.

I would be remiss to not mention the hybrid approach which make use of a totally different data structure called "up values" originally developed for a bytecode-interpreted version of Scheme and subsequently made popular in the imperative world by Lua. I wont be explaining this last method, as Bob Nystrom already did a fantastic job in his book Crafting Interpreters which is freely available on his website. Suffice it to stay, they function completely differently.

I will instead be explaining the "textbook" solution of implementing closures: heap allocating the call stack to form cactus stacks which also support the implementation of coroutines and threads (with the caveat that you will need some type of garbage collection for the Activation Records themselves).

Calling Conventions & VM Design

Proper entry and exit of subroutines to and from execution is dependent upon following the calling convention of whatever instruction set is being used. The virtual machine design and overall instruction set that i will be using is a stack machine architecture adapted from the example in Terrance Parr's "Programming Language Implementation Patterns"[1].

One design choice which has an outsized impact on the rest of the design is whether to use a unified stack in which the call stack and operand stack share a common space as is seen in the Pascal P-Machine and is similar to how the stack is managed in assembly. Conversely, we could use a segregated or disjoint call stack which makes use of an explicit "CallFrame" data structure as opposed to the abstract call frames of the unified stack model. 

The disjoint callstack model greatly simplifies managing call frames as registers are no longer computed as an offset from a base pointer, but instead are named variables in a struct. It also makes the implementation of heap allocated activation records (another name for call frames) much much easier.

struct ActivationRecord {
    int cp_index;
    int num_args;
    int num_locals;
    int ret_addr;
    StackItem locals[MAX_LOCAL];
    ActivationRecord* control;
    ActivationRecord* access;
    ActivationRecord(int idx = -1, int ra = 0, int args = 0, int locals = 0, ActivationRecord* calling = nullptr, ActivationRecord* defining = nullptr) {
        cp_index = idx;
        ret_addr = ra;
        num_args = args;
        num_locals = locals;
        control = calling;
        access = defining;
    }
};

During a procedure call the work to be done maintaining the call stack is divided between the caller and the callee. In order to control procedure calls our VM has the instructions defun, call, retfun, and mkclosed. defun is used to denote the entry point of a procedure, with it's operand being then umber of arguments the function expects. 

A Closer Look at Procedure Calls

We'll use the following bit of code to demonstrate how the calling convention of the Glaux VM (and in general, many others) works during a procedure call. The following snippet is of a function taking a single argument, to which it does a string concatenation and then prints the string.

def sayHi(let name) {
      println "hi " + name;
}
sayHi("world");

As a result of the compilation the four lines of higher level code now makes for 13 instructions in our lower level bytecode. A ratio of 4 instructions per line of code is quite normal:

Compiled Bytecode: 
 0: [0x18 jump 7 . ]
 1: [0x00 defun . 1 ]
 2: [0x04 ldconst 1 . ]
 3: [0x06 ldlocal 1 . ]
 4: [0x23 binop 1 . ]
 5: [0x32 print . . ]
 6: [0x15 retfun . . ]
 7: [0x25 mkclosure 0 0 ]
 8: [0x08 ldglobaladdr 1 . ]
 9: [0x11 stglobal . . ]
10: [0x04 ldconst 2 . ]
11: [0x05 ldglobal 1 . ]
12: [0x14 call 0 1 -1 ]
13: [0x33 halt 0 . ]

You can see the function body from lines 1 - 6.

 1: [0x00 defun . 1 ]
 2: [0x04 ldconst 1 . ]
 3: [0x06 ldlocal 1 . ]
 4: [0x23 binop 1 . ]
 5: [0x32 print . . ]
 6: [0x15 retfun . . ]

On Line 2 ldconst loads the the "hi " string by index from its location in the constant table and places it on the stack. Next, ldlocal loads the parameter from its spot in the call frame. On line 4 the binop 1 instruction indicates addition, which performs string concatenation when either operand is a string, converting any non-string operand when necessary. The print instruction on line 5 pops the string off the stack and displays it to stdout, and the retfun ends the procedure call.

What we're really concerned with though is how we set the envrionment up to make the call to that function in the first place. How does the parameter 'name' end up in local slot 1? Which brings us back to the calling convention. The very first bytecode instruction is an unconditional jump to address 7, which neatly jumps over the function definition to the first part of our calling convention: mkclosure. The code in closeOver() handles the mkclosure instruction:

void closeOver(Instruction& inst) {
       int func_id = inst.operand[0].intval;
       auto closure = constPool.get(func_id).objval->closure;
       closure->env = mostRecentAR(func_id);
       opstk[++sp] = StackItem(gc.alloc(new Closure(closure->func, closure->env)));
}

This instruction takes as it's operand the const pool index of the function being called. It wraps this function in a closure object and put's it on the operand stack: this is where we capture a pointer to the defining environment which in our VM model happens to be the most recent activation record of the function being called. If there is no previous activation record on the stack we default to the one on top. The next two instructions on lines 8 and 9 store the newly created closure in an address in the environment so it can be passed around (first class functions).

 7: [0x25 mkclosure 0 0 ]
 8: [0x08 ldglobaladdr 1 . ]
 9: [0x11 stglobal . . ]
10: [0x04 ldconst 2 . ]
11: [0x05 ldglobal 1 . ]
12: [0x14 call 0 1 -1 ]

Starting on line 10, the ldconst 2 instruction places the string at constant pool  2 on the operand stack: this is the parameter for the procedure call. Next, line 11 places the closure object we created previously stored in the envrionment and places it on the stack for the procedure call, after which we issue the call instruction.

void callProcedure(Instruction& inst) {
        int returnAddr = ip;
        int numArgs = inst.operand[1].intval;
        int cpIdx = inst.operand[0].intval;
        Closure* close = opstk[sp--].objval->closure;
        callstk = new ActivationRecord(cpIdx, returnAddr, numArgs, callstk, close->env);
        for (int i = numArgs; i > 0; i--) {
            callstk->locals[i] = opstk[sp--];
        }
        ip = close->func->start_ip;
}

The call instruction pops the closure object off the stack, using it's environment pointer to set the access link in the new Activation record, with the instruction pointer (ip) being used to set the return address. With the new activation record on the top of the stack we load the function parameters from the operand stack to their position as locals in the call frame. Finally, we set the instruction pointer to the first instruction of the function being called. This lands us at the defun instruction on line 1 which we previously jumped over.

void retProcedure() {
        ip = callstk->ret_addr;     
       callstk = callstk->control;
}

Upon hitting the retfun instruction on line 6, we pop the current activation record off of the stack, using the control link to restore the prervious environment. The previous activation record though is not freed as it might still be referenced through the access link of a procedure which is still in scope. This is the reason for heap allocating our call frames in the first place, and subsequently why you need some other form of garbage collection for them.

Accessing Variables from Enclosing Scopes

Interacting with memory from the stack is accomplished in the Glaux VM using load instructions for reading from memory and store instructions for writing to memory. Global variables are managed using ldglobal, stglobal, while local varibles are used through ldlocal and stlocal. The ldaddr instruction can be used for both local or global variables and is used for placing the address of a variable on the stack.

        void loadUpval(Instruction& inst) {
            opstk[++sp] = walkChain(inst.operand[1].intval)->locals[inst.operand[0].intval];
            if (verbLev > 1)
                cout<<"loaded Upval: "<<opstk[sp].toString()<<"from "<<inst.operand[0].intval<<" of scope "<<(fp)<<endl;
        } 
        void storeUpval(Instruction& inst) {
            StackItem t = opstk[sp--];
            StackItem val = opstk[sp--];
            walkChain(inst.operand[0].intval)->locals[t.intval] = val;
            if (verbLev > 1)
                cout<<"Stored upval at "<<t.intval<<" in scope "<<(inst.operand[0].intval)<<endl;
        }

Variables which are local to an enclosing scope are accessed through the ldupval and stupval instructions, each taking an offset for the number of enclosing scopes out from the currently active scope. We can access these values through the access link. 

       ActivationRecord* walkChain(int d) {
            if (d == GLOBAL_SCOPE) return globals;
            if (d == LOCAL_SCOPE) return callstk;
            auto x = callstk;
            while (x != nullptr && d > 0) {
                x = x->access;
                d--;
            }
            return x;
        }

Further Reading

[1] Parr, Terrance "Programming Language Implementation Patterns", Pragmatic Press 2009

[2] Aho, Sethi, Ullman "Compilers: Principles, Techniques, and Tools", 1986

[3] Louden, Kenneth "Compiler Construction: Principles and Practices", 1997

[4] Scott, Michael L. "Programming Language Pragmatics", (3rd) 2009

[5] "JVM Internals" - https://blog.jamesdbloom.com/JVMInternals.html


Leave A Comment