Implementing Semantic Backtracking

Posted 14 years ago by Kevin McCormick

Version: 4.0.0289

We are currently developing a JavaScript (Ecma-262) editor, including a Dynamic lexical parser and a semantic parser generated by the SyntaxEditor Parser Generator.

We're about 90% through implementing the language and we hit a couple of interesting snags in the language:

- Automatic Semicolon Insertion (ASI)
- Various Non-Terminal Ambiguities
- Left recursion

Most of the ambiguities and recursion issues we were able to factor out by modifying the grammar accordingly. We have two remaining ambiguities, as well as ASI left to implement in the grammar.

We determined that the best approach for implementing both ASI and our two remaining ambiguities would be through simple backtracking (i.e. try one alternation, and if it fails, backtrack to the start point of that non-terminal, and try the second alternation).

We're a little puzzled how we can implement backtracking from within the semantic parser. It would seem that ReadReverse would do the trick, but the stream is not accessible from within the semantic parser.

Any tips?

Comments (3)

Posted 14 years ago by Kevin McCormick

After doing a little more research, I have noticed two mysterious methods: StartPeek and StopPeek. Will calling StopPeek rewind the stream to the point where StartPeek was called?

Posted 14 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Kevin,

Sorry but due to how the grammar manages tokens and state transitions, is forward-only and you can't go back.

What you can do, however, is use peek operations (like your found) to do 'n' look-aheads like:

this.StartPeek();
try {
    // Loop and use this.Peek() to move forward and examine the this.PeekToken as needed
}
finally {
    this.StopPeek();
}

By using that sort of code, you can examine the next 'n' tokens and determine which production to consume next based on what you find out. StopPeek() will return the reader back to the point from which you called StartPeek(). But note that while in a peek op, you have to look at PeekToken, not LookAheadToken.

Actipro Software Support

Posted 14 years ago by Kevin McCormick

Hi,

Thanks for the quick response. I decided against the Peek method as in the case of JavaScript, determining a non-match for a non-terminal of arbitrary length would basically require rewriting a large portion of the parser by hand.

I did find an alternate solution, though I definitely would put in my support for backtracking support in future versions of the parser generator. The Ecma standard says to use the semantic analysis that "makes the most sense", so picking a non-terminal from a list of ordered alternations would be helpful.

That being said, my solution was twofold:

1) To implement Automatic Semicolon Insertion (ASI), I was able to leverage the LookAheadTokenIsOnDifferentLine property of the parser to determine whether the next token qualifies for automatic semicolon insertion. Huge lifesaver.

2) Another ambiguity existed for the IterationStatement (for loops) non-terminal. The ambiguity was that the both nonterminals LeftHandSideExpression and ExpressionNoIn are part of the first set of IterationStatement. Unfortunately, ExpressionNoIn contains LeftHandSideExpression and thus both have the same first-set. LeftHandSideExpression must be followed by InKeyword, while ExpressionNoIn (or LeftHandSideExpression) must be followed by Semicolon. My solution was to first match LeftHandSideExpression, and if that fails, then to match ExpressionNoIn instead. Once that is handled, I can look for the next token (in or ;), and handle the rest of the grammar safely. My solution is somewhat similar to the one found at http://research.xebic.com/es3/CSharp/ES3.g3

Best,
Kevin

The latest build of this product (v25.1.0) was released 2 months ago, which was after the last post in this thread.

Comments (3)

Add Comment