Resolving Shift/Reduce Conflicts With Operator Precedence
One of the major selling points of LR parsing is the ability to write expression grammars with a higher degree of ambiguity than would otherwise be allowed. When designing an expression grammar there are ways to encode the operators precedence and associativity right into the production rules.
There are situations where it may be desireable to have a "flat" expression grammar. A "flat" expression grammar uses a single non-terminal for an arguably more human-friendly representation.
E ::= E + E | E - E | E * E | E / E
E ::= -E
E ::= NUM | ( E )
This grammar when fed to a parser generator will spit out all kinds of shift/reduce conflicts and not be able to create useable tables.
So if that's the case than what am I even talking about this for? Well, I mention it because the grammar above can't be used as-is but it can be used: we just need to give it a bit more information. In todays post i'm going to talk about doing just that: Using operator precedence to resolve shift/reduce conflicts during LR parser construction.
Controlled Ambiguity
If we take our "flat" expression grammar, and format it in the style of BNF expected by MGCPGen (one production rule per line), we get the following, ambiguous grammar spec(dont worry about the action symbols).
sp ::= STMT
STMT ::= TK_PRINT EXP @mkprint
STMT ::= EXP @mkexprstmt
EXP ::= EXP TK_PLUS EXP @binop
EXP ::= EXP TK_MINUS EXP @binop
EXP ::= EXP TK_MUL EXP @binop
EXP ::= EXP TK_DIV EXP @binop
EXP ::= TK_LPAREN EXP TK_RPAREN @pass
EXP ::= TK_MINUS EXP @unary
EXP ::= TK_NUM @num
If we feed MGCPGen the grammar as is, we're left with the following:
max@laptop:~/lr_gen/ex$ mgcpgen -slr ambig_exp.mgrm
No way to resolve Shift/Reduce conflict: [13][TK_DIV] s11
No way to resolve Shift/Reduce conflict: [13][TK_MINUS] s8
No way to resolve Shift/Reduce conflict: [13][TK_MUL] s9
No way to resolve Shift/Reduce conflict: [13][TK_PLUS] s10
No way to resolve Shift/Reduce conflict: [14][TK_DIV] s11
No way to resolve Shift/Reduce conflict: [14][TK_MINUS] s8
No way to resolve Shift/Reduce conflict: [14][TK_MUL] s9
No way to resolve Shift/Reduce conflict: [14][TK_PLUS] s10
No way to resolve Shift/Reduce conflict: [15][TK_DIV] s11
No way to resolve Shift/Reduce conflict: [15][TK_MINUS] s8
No way to resolve Shift/Reduce conflict: [15][TK_MUL] s9
No way to resolve Shift/Reduce conflict: [15][TK_PLUS] s10
No way to resolve Shift/Reduce conflict: [16][TK_DIV] s11
No way to resolve Shift/Reduce conflict: [16][TK_MINUS] s8
No way to resolve Shift/Reduce conflict: [16][TK_MUL] s9
No way to resolve Shift/Reduce conflict: [16][TK_PLUS] s10
No way to resolve Shift/Reduce conflict: [17][TK_DIV] s11
No way to resolve Shift/Reduce conflict: [17][TK_MINUS] s8
No way to resolve Shift/Reduce conflict: [17][TK_MUL] s9
No way to resolve Shift/Reduce conflict: [17][TK_PLUS] s10
max@laptop:~/lr_gen/ex$
As you can see, and as would be expected, the generator can't decide what to do when it encounters an operator. Using the `set_token_prec` declaration in MGCPGen, (similar to Yacc/Bisons %prec) is used to assign an explicit precedence level and associativity to the designated token.
set_token_prec TK_MINUS 5 left
set_token_prec TK_PLUS 5 left
set_token_prec TK_MUL 10 left
set_token_prec TK_DIV 10 left
sp ::= STMT
STMT ::= TK_PRINT EXP @mkprint
STMT ::= EXP @mkexprstmt
EXP ::= EXP TK_PLUS EXP @binop
EXP ::= EXP TK_MINUS EXP @binop
EXP ::= EXP TK_MUL EXP @binop
EXP ::= EXP TK_DIV EXP @binop
EXP ::= TK_LPAREN EXP TK_RPAREN @pass
EXP ::= TK_MINUS EXP @unary
EXP ::= TK_NUM @num
Running the grammar once again through pgen now generates a much different output:
Shift/Reduce conflict: [13][TK_DIV] s11: Keeping the shift
Shift/Reduce conflict: [13][TK_MINUS] s8: Reduced on associativity
Shift/Reduce conflict: [13][TK_MUL] s9: Keeping the shift
Shift/Reduce conflict: [13][TK_PLUS] s10: Reduced on associativity
Shift/Reduce conflict: [14][TK_DIV] s11: Keeping the shift
Shift/Reduce conflict: [14][TK_MINUS] s8: Reduced on associativity
Shift/Reduce conflict: [14][TK_MUL] s9: Keeping the shift
Shift/Reduce conflict: [14][TK_PLUS] s10: Reduced on associativity
Shift/Reduce conflict: [15][TK_DIV] s11: Reduced on associativity
Shift/Reduce conflict: [15][TK_MINUS] s8: Made to reduce on precedence for TK_MINUS
Shift/Reduce conflict: [15][TK_MUL] s9: Reduced on associativity
Shift/Reduce conflict: [15][TK_PLUS] s10: Made to reduce on precedence for TK_PLUS
Shift/Reduce conflict: [16][TK_DIV] s11: Keeping the shift
Shift/Reduce conflict: [16][TK_MINUS] s8: Reduced on associativity
Shift/Reduce conflict: [16][TK_MUL] s9: Keeping the shift
Shift/Reduce conflict: [16][TK_PLUS] s10: Reduced on associativity
Shift/Reduce conflict: [17][TK_DIV] s11: Reduced on associativity
Shift/Reduce conflict: [17][TK_MINUS] s8: Made to reduce on precedence for TK_MINUS
Shift/Reduce conflict: [17][TK_MUL] s9: Reduced on associativity
Shift/Reduce conflict: [17][TK_PLUS] s10: Made to reduce on precedence for TK_PLUS
Instead of unresolvable shift/reduce conflicts, mgcpgen can take the appropriate action to resolve the conflicts it encounters!
Adding Operator Precedence to SLR(1) Table Construction
Were going to add a new data structure to the CFG structure used for holding the information in the `set_token_prec` declarations from above. This information will be used to (hopefully) resolve any conflicts which arise when encountering the specified tokens.
struct OpPrec {
Symbol sym;
int prec_level;
string assoc;
OpPrec(Symbol s = "", int p = 0, string a = 0) : sym(s), prec_level(p), assoc(a) { }
};
struct Grammar {
/* .... */
map<Symbol, OpPrec> precedenceMap;
};
It should be fairly obvious how the `set_token_prec` directive maps to the above structure when reading the grammar file. Great, so now that we have the needed information, how do we use it? The answer to that question lies in the refactoring of our action table generation procedure.
Using associativity and precedence to resolve conflicts
In order to resolve a conflict between to tokens using precedence rules, then there needs to be an existing relationship between the two symbols. Meaning, If we get a shift/reduce conflict betwen '+' and '/' but we only have precedence information for '+', then we might as well not have any at all. With that in mind, the first thing we do when encountering a conflict is to check the precedence map for both symbols, only proceeding if both exist.
for (Symbol a : G.follow.at(p.lhs)) {
if (!tab[s].count(a)) {
tab[s][a] = "r"+to_string(p.pid);
} else {
string existing = tab[s][a];
if (existing[0] == 's') {
Symbol prod_sym = get_production_precedence_symbol(p, G);
if (!G.precedenceMap.count(prod_sym) || !G.precedenceMap.count(a)) {
cout<<"No way to resolve Shift/Reduce conflict: ["<<s<<"]["<<a<<"] "<<existing<<endl;
} else {
But where do we get the two symbols from? One of the conflicting symbols is from the current production being processed while the other symbol is from the follow set we generate lookaheads from. To obtain the correct symbol from the production we implement a procedure called 'get_production_precdence_symbol()' who's name should tell you it's purpose. But how does it know which symbol to return? That's actually simpler than you would think: a symbols leftmost terminal symbol is responsible for the productions precedence.
Symbol get_production_precedence_symbol(const Production& p, const Grammar& G) {
for (Symbol sym : p.rhs) {
if (G.terminals.count(sym))
return sym;
}
return "";
}
If we do have the needed precedence information, then we have a simple set of rules for resolving conflicts.
If the lookaheads precedence level is greater than the production's, we keep the existing shift. If the productions precedence level is greater, we replace it with a reduction. If the two symbols have the same precedence, than we turn to associativity.
If the lookahead symbol is left associative we reduce, otherwise we prefer the existing shift.
OpPrec pr = G.precedenceMap[prod_sym];
OpPrec la = G.precedenceMap[a];
cout<<"Shift/Reduce conflict: ["<<s<<"]["<<a<<"] "<<existing<<": ";
if (la.prec_level > pr.prec_level) {
cout<<"Keeping existing shift"<<endl;
} else if (pr.prec_level > la.prec_level) {
tab[s][a] = "r" + to_string(p.pid);
cout<<"Reduce on precedence for "<<a<<endl;
} else {
if (la.assoc == "left") {
tab[s][a] = "r" + to_string(p.pid);
cout<<"Reduced on associativity"<<endl;
} else {
cout<<"Keep existing shift."<<endl;
}
}
}
} else {
cout<<"Reduce/Reduce conflict: ["<<s<<"]["<<a<<"] "<<existing<<endl;
}
}
}
After implementing the above rules, we can now resolve shift/reduce conflicts which previously would have put a stop to table generation. Thats what i've got for today, Happy Hacking!
-
Resolving Shift/Reduce Conflicts With Operator Precedence
-
Squeezing DFAs with Pair Compression
-
Designing Abstract Syntax Trees: Homogenous vs. Heterogenous Node Structures
-
From LR Items to LR States
-
Calculating Follow Sets of a Context Free Grammar
-
Streaming Images from ESP32-CAM for viewing on a CYD-esp32
-
Constructing the Firsts Set of a Context Free Grammar
-
Inchworm Evaluation, Or Evaluating Prefix-Expressions with a Queue
-
Data Structures For Representing Context Free Grammar
-
A B Tree of Binary Search Trees
Leave A Comment