When it comes to parsing expressions many, many books will cover addition, subtraction, and multiplication, often times leaving the implementation of division, exponents and square root “as an exercise for the reader”. This is done to keep texts concise, and also because most books pay the aformentioned lip service to recursive descent before focusing on LR parsing algorithms. This is, to put it mildly, a shame. If the big names in modern compilers (GCC, LLVM, Java) have anything to say about it recursive descent is the way to implement a parser – and I tend to agree. So, allow me to share a bit of the secret sauce: how to parse right associative operators top down with recursive descent.
In todays post I will be implementing both sqrt and pow, as they have the same precedence, but one is a binary operator and the other unary. This should give sufficient breadth to put you on your way. Though the emphasis of todays post is on the aforementioned right associative operators, I will be implementing the following grammar for arithmetic expressions:
<expr> := <term> { ("+"|"-") <term> }
<term> := <factor> { ("*"|"/") <factor> }
<factor> := "sqrt" <primary> | <primary> { ("^") <factor> }
<primary> := ID | NUM | "("<expr>")" | "-"<expr>
In recursive descent parsing left associative operators are handled by replacing the recursive call with a loop a’la tail recursion removal, which can be seen in their “standard” implementations, (code for currSym() and match() have been left out for brevity):
astnode* expr() {
astnode* node = term();
while (currSym() == TK_ADD || currSym() == TK_SUB) {
astnode* m = makeExprNode(BINARYOP_EXPR, current);
match(currSym());
m->child[0] = node;
node = m;
node->child[1] = term();
}
return node;
}
astnode* term() {
astnode* node = factor();
while (currSym() == TK_MUL || currSym() == TK_DIV) {
astnode* m = makeExprNode(BINARYOP_EXPR, current);
match(currSym());
m->child[0] = node;
node = m;
node->child[1] = factor();
}
return node;
}
Pay careful attention to the pointer swapping taking place on the inner loop: this is crucial to building proper syntax trees. Speaking of building correct parse trees, we need to proceed differently for right associative operators. The solution is straight forward: instead of looping, we use explicit right recursion! We’ll start with exponentiation since it is a binary operator.
astnode* Parser::factor() {
astnode* node = var();
if (currSym() == TK_POW) {
astnode* m = makeExprNode(BINARYOP_EXPR, current);
match(TK_POW);
m->child[0] = node;
m->child[1] = factor();
return m;
}
return node;
}
Sqrt is right associative like exponentiation, but a unary operator like the negative sign. However, if we treated sqrt the same as unary minus, we’d end up with a pretty funky syntax tree (guess how i know) because it has both a higher precedence and different associativity. So we need to figure out how to stick into factor(), and it turns out that “just sticking it in factor” is exactly what needs to be done!
astnode* Parser::factor() {
if (currSym() == TK_SQRT) {
astnode* m = makeExprNode(UNARYOP_EXPR, current);
match(TK_SQRT);
m->child[0] = factor();
return m;
}
astnode* node = var();
if (currSym() == TK_POW) {
astnode* m = makeExprNode(BINARYOP_EXPR, current);
match(TK_POW);
m->child[0] = node;
m->child[1] = factor();
return m;
}
return node;
}
Tada! And to round our last precedence level:
astnode* Parser::var() {
astnode* node;
switch (currSym()) {
case TK_ID: {
node = makeExprNode(ID_EXPR, current);
match(TK_ID);
}
break;
case TK_NUM: {
node = makeExprNode(CONST_EXPR, current);
match(TK_NUM);
}
break;
case TK_REALNUM: {
node = makeExprNode(CONST_EXPR, current);
match(TK_REALNUM);
}
break;
case TK_LPAREN: {
match(TK_LPAREN);
node = simpleExpr();
match(TK_RPAREN);
}
default:
break;
}
return node;
}
And with that, you now have what you need to go forth and descend recursively, confident that no matter the expressions operators precedence or associativity, unary or binary, you WILL construct the correct syntax tree! Until next time, Happy hacking,