Separating the Men from the Boys, Knuth Style
When I set out to develop my own language, there were a several "milestones" I had set for myself along the way. Goals I could tick off to gauge my progress. One such goal I've been working towards for some time was to be able to pass Knuth's "Man or Boy test". I'm happy to say, that as of this afternoon, Owlscript can successfully execute (an owlscript implementation of) knuths test program, arriving at the correct output. The first 15 values at least, then it runs out of stack space - but thats to be expected.
To mark the occasion I figured I would write a post about some of the difficulties faced in getting my interpreter to properly run the test, and how I addressed them. But first, here is the code for Knuths man or boy test implemented in owlscript:
def C(var i) {
return &() -> i;
}
def A(var k, var x1, var x2, var x3, var x4, var x5) {
def B() {
k--;
return A(k, B, x1, x2, x3, x4);
}
if (k <= 0) {
return x4()+x5();
}
return B();
}
def main() {
let i := 0;
while (i < 15) {
print i + ": ";
println A(i, C(1), C(-1), C(-1), C(1), C(0));
i := i + 1;
}
}
And the output:
max@MaxGorenLaptop:~/owlscript$ owlscript manorboy.owl
0: 1
1: 0
2: -2
3: 0
4: 1
5: 0
6: 1
7: -1
8: -10
9: -30
10: -67
11: -138
12: -291
13: -642
14: -1446
I suppose before we get too much further I should talk a little bit about the actual test, and what is supposed to accomplish.
Knuth's Man Or Boy Test
In the early 1960s, as various implementations of the Algol 60 language came into existence it became clear that some of algol's more novel features were not trivial to implement, leading to compilers of varying quality. Knuth termed those which were correctly implemented "man compilers", while dubbing those of a more dubious quality "boy compilers". It was a different time.
To quote Knuth himself:
"There are quite a few ALGOL60 translators in existence which have been designed to handle recursion and non-local references properly, and I thought perhaps a little test-program may be of value. Hence I have written the following simple routine, which may separate the man-compilers from the boy-compilers." - Donald Knuth, 1964
His original implementation in Algol 60 was thus:
begin
real procedure A(k, x1, x2, x3, x4, x5);
value k; integer k;
real x1, x2, x3, x4, x5;
begin
real procedure B;
begin k := k - 1;
B := A := A(k, B, x1, x2, x3, x4)
end;
if k ≤ 0 then A := x4 + x5 else B
end;
outreal(1, A(10, 1, -1, -1, 1, 0))
end
You might take a look at that snippet of code and think "huh? THAT is the code that is so hard to properly execute?" and at first glance you'd be excused for thinking so. However, hidden in that dozen lines of code is enough complexity to give any interpreter or compiler a good thrashing. Packed into routine we have recursion, nested procedures, conditionals, call by value AND call by name parameters, as well as procedure and non-local references.
It should be remembered that while we take things like being able to implement recursive algorithms for granted, when algol was invented it was the first high level language to support doing so. It took some figuring out to get it right. And not only that, the same can be said for nested procedures, non local references, procedure references, call by name parameters, and the list goes on.
So what does this program actually do? The programs execution causes a proliferation of recursive calls, which, because of algols lexical scoping and call by name parameter passing results in a very convoluted tree of activation records. With a starting value of 5, we end up with the following trace:
B() From A, k: 5
A() From B, k: 4
| B() From A, k: 4
| A() From B, k: 3
| | B() From A, k: 3
| | A() From B, k: 2
| | | B() From A, k: 2
| | | A() From B, k: 1
| | | | B() From A, k: 1
| | | | A() From B, k: 0
| | | | | x4() + x5()
| A() From B, k: 2
| | B() From A, k: 2
| | A() From B, k: 1
| | | B() From A, k: 1
| | | A() From B, k: 0
| | | | x4() + x5()
A() From B, k: 3
| B() From A, k: 3
| A() From B, k: 2
| | B() From A, k: 2
| | A() From B, k: 1
| | | B() From A, k: 1
| | | A() From B, k: 0
| | | | x4() + x5()
A() From B, k: 2
| B() From A, k: 2
| A() From B, k: 1
| | B() From A, k: 1
| | A() From B, k: 0
| | | x4() + x5()
A() From B, k: 1
| B() From A, k: 1
| A() From B, k: 0
| | x4() + x5()
If the implementer of the compiler or interpreter executing the program was careless in managing scope and variable shadowing it would be very easy to access the incorrect instance of a variable, resulting in an incorrect answer.
Challenges In Language Implementation
Call By Name
Implementing an interpreter capable of computing knuths procedure requires either support for call by name, or some way of emulating its behavior. In chapter 3 of The Structure and Interpretation of Computer Programs it is disclosed that scheme's keywords delay and force used for implementing delayed evaluation, a form of call by name, is syntactic sugar for wrapping a value in a lambda which returns that value when evaluated:
(define delay (lambda (x) (lambda (x) x)))
(define force (lambda (x xs) (apply x xs)))
Owlscript doesn't have call by name, but it does have anonymous functions and closures, and so the wrapping is done explicitly:
def C(var i) {
return &() -> i;
}
println A(i, C(1), C(-1), C(-1), C(1), C(0));
Procedure References
Owlscript was designed with first class functions in mind, so procedure references was less of a challenge and more an implementation detail. By supporting first class functions, nested procedures become easily implemented as well, the complicating factor being if you want to allow those procedures to outlive their enclosing procedures (closures).
Lexical Scoping And Non-local Variables (The Funarg Problem)
This is where a particularly cool datastructure comes into play: the cactus stack. The stack of activation records used to implement procedure calls and recursion make use of both a static link _and_ dynamic link, allowing the implementation of Lexical scoping with nested procedures.
A cactus stack uses parent pointers to the enclosing scope
This data structure is constructed by using a stack of the data structure shown below. Each stack frame maintains two pointers, a dynamic link and a static link. The static link is a pointer to the functions defining environment.
typedef unordered_map<string, Object> Environment;
struct StackFrame {
Environment bindings;
StackFrame* dynamicLink;
StackFrame* staticLink;
StackFrame(StackFrame* defining = nullptr, StackFrame* calling = nullptr) : staticLink(defining), dynamicLink(calling) {
}
};
Whenever a procedure/function/lambda is defined, it obtains a pointer to the environment in which it is defined. When the procedure/function/lambda is called, we use that pointer to populate the parent pointer of the environment being set up for the function call. Once populated, that environment is then pushed onto the call stack.
void lambdaExpression(astnode* node) {
Function* func = new Function(node->child[0], node->child[1]); //params & body
func->name = "(lambda)";
func->closure = cxt.getStack(); //defining environment
push(cxt.getAlloc().makeFunction(func));
}
void funcExpression(Function* func, astnode* params) {
StackFrame* env = new StackFrame(func->closure, cxt.getStack()); //use functions pointer to defining env to populate dynamic link
evalFunctionArguments(params, func->params, env);
cxt.openScope(env);
exec(func->body);
bailout = false;
cxt.closeScope();
}
This results in a setup where we have (for example) 2 environments on the stack, one directly above the other, but if we follow the parent pointer of the top environment, you end up in it's defining environment, which is not necessarily the environment directly below it on the stack. By keeping a reference to the defining environment, we also keep said environment alive should it's function go out of scope before the function referencing it does. Without the pointer to the defining environment, we wouldn't be able to resolve the correct variable names during the man or boy test.
That's what I've got for you today, I know this post was a bit different than usual, but hopefully someone else will get something from it. Anyways, until next time. Happy Hacking!
-
Improving mgcLisp's define syntax
-
Separating the Men from the Boys, Knuth Style
-
Reducing rotations during deletion from AVL trees
-
Parsing Lisp: From Data To Code
-
Swiz Heaps: A deterministic modification of Randomized Meldable Heaps
-
Let's talk Eval/Apply
-
BST Deletion: Removal By Merge
-
Dictionary Based Compression: The LZW Algorithm
-
Taking Action: Compiling Procedures to P-Code
-
Making Decisions: Compiling If Statements to P-Code
Leave A Comment