The Owl Runtime Environment: The Call Stack

mgcadmin 08-03-2024

Programs rarely exist in isolation. With a few exceptions, a program requires a supporting environment within which to run. For a compiled language, this can be handled by the operating system or in more extreme cases like microcontrollers and embedded applications, they will setup their own environment.

An interpreted language requires an auxiliary environment on top of the host operating system, often called a "virtual machine". Java's JVM is probably the most well known example. Owl also makes use of a virtual machine to run in, the design of which is the topic of todays post.

*This post is a little different from my usual format, its less a tutorial and more a "how I did it" piece, as such, its a bit light on code examples and focuses a bit more on theory.

Interpreters, Virtual Machines, and Intermediate Representations

Generally when one things of a virtual machine for an interpreted language, bytecode interpreters are what come to mind. Bytecode based stack machines like the one used by UCSD Pascal, or more recently the Java Virtual Machine are the most popular, with register machines coming in a close second. Stack machines allow for the use of a simple but powerful "assembly" language that is further compiled to a compact representation called "bytecode".

A tree walking interpreter differs from this model in that it uses the augmented abstract syntax tree emitted by the parser directly as the "instructions" for the virtual machine. Both models of virtual machine are state machines. The way they transition between states is vastly different. Despite these differences, which are the result of translation and execution happening at a higher level in tree walking interpreters as compared to bytecode VMs, there are a striking number of similarities between the two, especially in how procedures and recursion is managed.

The Call Stack

Despite not being a stack machine, the OwlVM does still make use of a stack, its just not the center piece of the memory model and operations. Regardless of your execution model, be it a register machine, stack machine, or tree walking interpreter you need a way to manage branching, procedure calls and scope. The simplest and most idiomatic way of accomplishing this is with a stack.

 class RuntimeStack {
    private:
        StackFrame* rtStack[MAX_RT_STACK];
        int rtsp;
    public:
        RuntimeStack();
        int size();
        bool empty();
        void push(StackFrame* sf);
        void pop();
        StackFrame* top();
};

The call stack stores "stack frames", though stack frame is a bit of a gross simplification of this particular data structure - it serves the same purpose. This setup gives a straight forward way to manage scope, with each stack frame getting its own symbol table.

struct StackFrame {
    map<string, int> symbolTable;
    ASTNode* body;
    ASTNode* params;
    StackFrame* staticLink;
    Object returnVal;
};

The members of the stack frame structure are:

symbolTable is a map containing the memory addresses of any local variables or parameters used by the procedure.
body is a pointer to the beginning of the procedure body node in the AST, likewise
params is a pointer to the list of the procedures parameters.
staticLink is set by the runtime stack's push() method, and points to the stack frame on the stack below the current frame.
returnVal stores the value returned by the procedure call this stackframe represents.

Each procedure has a "master stack frame", which is created by the interpreter from procedure declarations and are used as a "stub" for execution.

Calling Convention

When a procedure is called we push updated copies of the master stack frame on to the call stack. The new stack frames are setup by a call to the prepStackFrame() method from inside the Dispatch() method of the interpreter.

 
Object Interpreter::Dispatch(ASTNode* node) {
    say("Dispatch: " + node->attribute.name);
    Object retVal;
    if (procedures.find(node->attribute.name) != procedures.end()) {
        StackFrame* nsf = prepStackFrame(procedures[node->attribute.name]);
        auto paramIt = nsf->params;
        auto argIt = node->child[1];
        //now we want to assign the parameters to their correct symbol tables.
        while (paramIt && argIt) {
            Object obj = interpretExpression(argIt);
            memStore.store(nsf->symbolTable[paramIt->attribute.name], obj);
            paramIt = paramIt->sibling;
            argIt = argIt->sibling;
        }
        callStack.push(nsf);
        Execute(callStack.top()->body);

During preparation of the stack frame, it allocates any memory needed during execution of the procedure such as for parameters or local variables, saving the address in the stack frames symbol table.

StackFrame* Interpreter::prepStackFrame(StackFrame* curentFrame) {
    StackFrame* nextFrame = new StackFrame;
    nextFrame->body = curentFrame->body;
    nextFrame->params = curentFrame->params;
    nextFrame->returnVal = curentFrame->returnVal;
    for (auto m : curentFrame->symbolTable) {
        nextFrame->symbolTable[m.first] = memStore.allocate(1);
    }
    return nextFrame;
}

Once we have our stack frame prepared we call the Execute() method of the interpreter on the stack frames body pointer. When the procedure completes, it's return value (if it has one) is available before the stack frame is popped off of the runtime stack.

        retVal = callStack.top()->returnVal;
        for (auto addr : callStack.top()->symbolTable) {
            memStore.free(addr.second);
        }
        callStack.pop();
        say("value returned: " + retVal.toString());
        return retVal;
    } else {
        say("Hoot! No such procedure: " + node->attribute.name);
    }
    onExit("Dispatch");
    return retVal;
}

The return value is set in a different procedure, doReturn():

void Interpreter::doReturnStatement(ASTNode* node) {
    onEnter("Return Statement.");
    Object retVal =  interpretExpression(node->child[0]);
    callStack.top()->returnVal = retVal;
    say(retVal.toString() + " saved as return value.");
    onExit();
}

Recursive Procedure Calls In Action

If we want to support procedure calls, without recursion a la early FORTRAN, we could get away without a call stack, but we'd be left with a static environment that is more trouble than its worth without gaining anything. Also, recursion is pretty important and was one of firm requirements for Owl. So with that being said, let's take a look at how Owl executes a simple recursive procedure using the call stack just described.

Well use the following Owl program as an example:

program 'sumExample';
begin
    let m: int := 1;
    func sum(n: int) begin
        if (n == 0) then
            return 0;
        else
            return n + sum(n - 1);
        end;
    end
    print "Enter a number: ";
    input m;
    print (sum(m) + "\n");
end

The program prompts the user for a number, and then returns the sum of the series of that number. While it's very simple program, it exercises several areas of the owl interpreter that must together correctly in order to execute correctly. We are primarily interested in the recursive call to sum in the else clause. Just a quick check the program works as expected before we continue:

max@MaxGorenLaptop:~/GitHub/OwlInterpreter$ ./owlcli owlcode/sum.owl
 Enter a number: 5
 15
 max@MaxGorenLaptop:~/GitHub/OwlInterpreter$ ./owlcli owlcode/sum.owl
 Enter a number: 3
 6
 max@MaxGorenLaptop:~/GitHub/OwlInterpreter$ ./owlcli owlcode/sum.owl
 Enter a number: 8
 36
 max@MaxGorenLaptop:~/GitHub/OwlInterpreter$

Now that we know the program works as expected, lets take a look a the AST that was created by the parser, so we have a better understanding of what's happening behind the scenes.

[VARDECL] let
     [ID_EXPR] m
     [CONST_EXPR] 1
   [FUNCDECL] sum
     [IFSTM] if
       [OP_EXPR] = EQUAL
         [ID_EXPR] n
         [CONST_EXPR] 0
       [RETURNSTM] return
         [CONST_EXPR] 0
       [RETURNSTM] return
         [OP_EXPR] + PLUS
           [ID_EXPR] n
           [PROCDCALL] sum
             [OP_EXPR] - MINUS
               [ID_EXPR] n
               [CONST_EXPR] 1
     [PARAM_EXPR] n
   [PRINTSTMT] print
     [CONST_STR] Enter a number: 
   [READSTM] input
     [ID_EXPR] m
   [PRINTSTMT] print
     [OP_EXPR] + PLUS
       [PROCDCALL] sum
         [ID_EXPR] m
       [CONST_STR] <newline>

When the interpreter is traversing the AST and arrives at the FUNCDECL node for the sum procedure, it creates an entry in the 'procedures' symbol table, as mentioned above, which is the "master stack frame" used as a template for procedure calls.

The entry in procedures symbol table for sum will have the body pointer point to this sub tree of the AST:

[IFSTM] if
    [OP_EXPR] = EQUAL
       [ID_EXPR] n
       [CONST_EXPR] 0
     [RETURNSTM] return
         [CONST_EXPR] 0
     [RETURNSTM] return
         [OP_EXPR] + PLUS
           [ID_EXPR] n
           [PROCDCALL] sum
             [OP_EXPR] - MINUS
               [ID_EXPR] n
               [CONST_EXPR] 1

The params pointer will point to this sub tree:

     [PARAM_EXPR] n

The stackFrame structure prepared and saved, the interpreter continues, and will encounter the print statement who's expression will initiate the call to sum.

[PRINTSTMT] print
     [OP_EXPR] + PLUS
       [PROCDCALL] sum
         [ID_EXPR] m
       [CONST_STR] <newline>

Upon traversing the PROCDCALL node the interpreter will fetch the master stack frame and make a copy of it in prepStackFrame() as described above. pushes it onto the stack, and then calculates the parameters, and then executes the procedure by traversing the subtree pointed to by the stackFrames body pointer. Let's see what a trace of that would look like:

(0) Declaring Variable: m with value 1 type: INTEGER
  (1) Procedure Declaration: sum
  (1) Parameter: n added to procedures symbol table.
  (1) [PRINT]
 Enter a number: 3
  (1) [PRINT]
   (2) Expression OP_EXPR
    (3) eval: PLUS
     (4) Expression PROCDCALL
     (4) Dispatch: sum
      (5) Expression ID_EXPR
       (6) retrieveFromMemoryByName
       (6) ID: m, Address: 1, offset: 0, value: 3, type: INTEGER

At this point in the trace we are just about to execute the first statement in the procedure, the if statement. which will determine we should take the else clause branch, leading to the first recursive procedure call:

(6) Expression OP_EXPR
        (7) eval: EQUAL
         (8) Expression ID_EXPR
          (9) retrieveFromMemoryByName
          (9) ID: n, Address: 3, offset: 0, value: 3, type: INTEGER
         (8) Expression CONST_EXPR
         (8) CONST_EXPR value: 0
         (8) 3.000000 EQUAL 0.000000
          (9) rel Op
          (9) eval result: 0
         (8) value: false
        (7) else clause
         (8) Return statement.
          (9) Expression OP_EXPR
           (10) eval: PLUS
            (11) Expression ID_EXPR
             (12) retrieveFromMemoryByName
             (12) ID: n, Address: 3, offset: 0, value: 3, type: INTEGER
            (11) Expression PROCDCALL
            (11) Dispatch: sum

The call to sum proceeds much the same way it just did. It's not actually the recursion, or the procedure calls that require the stack. It's returning the from the procedures that does. Once the recursion reaches its base case, it then has to "unwind", and this is where the stack comes in to play.

                              (29) Return Statement.
                               (30) Expression CONST_EXPR
                               (30) CONST_EXPR value: 0
                               (30) 0 saved as return value.
                              (29) value returned: 0
                              (29) 1.000000 PLUS 0.000000
                               (30) math Op
                              (29) eval result: 1
                             (28) value: 1
                            (27) 1 saved as return value.
                           (26) value returned: 1
                           (26) 2.000000 PLUS 1.000000
                            (27) math Op
                           (26) eval result: 3
                          (25) value: 3
                         (24) 3 saved as return value.
                        (23) value returned: 3
                        (23) 3.000000 PLUS 3.000000
                         (24) math Op
                        (23) eval result: 6
                       (22) value: 6
                      (21) 6 saved as return value.
                     (20) value returned: 6
                     (20) Expression CONST_STR
                     (20) 6.000000 PLUS 0.000000
                      (21) String Operation
                     (20) value: 6
 6

Wrapping Up

That's all I've got for today. The other major component of the run time environment is the memory manager, and will be the topic of it's own post. Until next time, happy hacking!

The code for Owl is available on my Github: https://github.com/maxgoren/OwlInterpreter

MaxGCoding.com

For the love of programming

The Owl Runtime Environment: The Call Stack

Interpreters, Virtual Machines, and Intermediate Representations

The Call Stack

Calling Convention

Recursive Procedure Calls In Action

Wrapping Up

Leave A Comment

Procedural Map Generation with Binary Space Partitioning

Exact Match String Searching: The Knuth-Morris-Pratt Algorithm

The visitor pattern: OOP takes on the Expression Problem

Improving mgcLisp's define syntax

Separating the Men from the Boys, Knuth Style

Reducing rotations during deletion from AVL trees

Parsing Lisp: From Data To Code

Swiz Heaps: A deterministic modification of Randomized Meldable Heaps

Let's talk Eval/Apply

BST Deletion: Removal By Merge