Resolving Lua Grammar Ambiguities with Can-Match Callbacks

SyntaxEditor for WPF Forum

Posted 2 years ago by Will Gauthier
Version: 21.1.3
Avatar

I'm writing a Lua grammar, but what I have so far suffers from multiple ambiguities. I'm trying to resolve them using can-match callbacks, but I have questions about how they work/should be implemented even after reading the documentation.

The first ambiguity I'm addressing is that both of the variableOrExpression terminal's productions start with an openParenthesis:

variableOrExpression.Production = variable
| @openParenthesis + expression + @closeParenthesis;

variable.Production = (@identifier | @openParenthesis + expression + @closeParenthesis + variableSuffix) + variableSuffix.ZeroOrMore();

This is my can-match callback attempt:

bool CanMatchVariable(IParserState state)
{
    if (state.TokenReader.LookAheadToken.Id == LuaTokenId.Identifier)
    {
        return true;
    }
    else if (state.TokenReader.LookAheadToken.Id == LuaTokenId.OpenParenthesis)
    {
        state.TokenReader.Push();
        try
        {
            state.TokenReader.Advance();
            if (expression.CanMatch(state) && state.TokenReader.LookAheadToken.Id == LuaTokenId.CloseParenthesis)
            {
                state.TokenReader.Advance();
                if (variableSuffix.CanMatch(state))
                {
                    return true;
                }
            }
         }
         finally
         {
            state.TokenReader.Pop();
         }
     }

      return false;
}

My questions are:

  1. Do I have to worry about the variableSuffix.ZeroOrMore(), or is it sufficient to match only as few terminals and non-terminals as necessary for uniqueness?
  2. Am I correct in assuming that calling CanMatch on a terminal like expression will consume tokens, so I need the TokenReader in a Push/Pop block and LookAheadToken should indeed be CloseParenthesis?

I'm also working on a second ambiguity where the statement terminal includes two productions that start with the forKeyword and two with the localKeyword:

statement.Production = @semicolon
| variableList + @assignment + expressionList
| functionCall
| label
| @breakKeyword
| @gotoKeyword + @identifier
| @doKeyword + block + @endKeyword
| @whileKeyword + expression + @doKeyword + block + @endKeyword
| @repeatKeyword + block + @untilKeyword + expression
| @ifKeyword + expression + @thenKeyword + block + (@elseifKeyword + expression + @thenKeyword + block).ZeroOrMore() + (@elseKeyword + block).Optional() + @endKeyword
| @forKeyword + @identifier + @assignment + expression + @comma + expression + (@comma + expression).Optional() + @doKeyword + block + @endKeyword
| @forKeyword + identifierList + @inKeyword + expressionList + @doKeyword + block + @endKeyword
| @functionKeyword + functionName + functionBody
| @localKeyword + @functionKeyword + @identifier + functionBody
| @localKeyword + attributeIdentifierList + (assignment + expressionList).Optional();

Since there are so many alternations, will the can-match callback have to handle all of them? Effectively, does it wholly replace non-ambiguous things like label, or do I only need to make it handle the specific problem starting non-terminals?

[Modified 2 years ago]

Comments (4)

Answer - Posted 2 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Will,

The problem there is that both the "variable" and the second "variableOrExpression" alternation are capable of starting with "openParenthesis + expression + @closeParenthesis".  Parsing over an "expression" would be too much to do in a can-match.

If I had this grammar, I'd probably change it to something like:

variableOrExpression.Production =
  @openParenthesis + expression + @closeParenthesis + variableSuffix.ZeroOrMore()
  | @identifier + variableSuffix.ZeroOrMore();

I believe that handles all the scenarios in your original "variableOrExpression" and doesn't require a can-match.

Can-match callbacks are typically just going to match the bare minimum number of tokens to determine a path (an alternation option) to take for an alternation that has has ambiguities.  They replace the "first set" of tokens for a production and let you programmatically determine if a particular non-terminal can be matched.  You generally want to avoid them if possible though and try to find other ways like the options given in this reply to remove ambiguity.

Yes, can-match callbacks will consume tokens since you have direct access to the TokenReader.  So you must use Push()/Pop() around any changes to TokenReader.

For the second ambiguity, I'd do something like there where you split out the common "localKeyword" into a separate production:

localStatement.Production = @localKeyword + 
  (
    @functionKeyword + @identifier + functionBody
    | attributeIdentifierList + (assignment + expressionList).Optional()
  );

And call that from "statement" as a single alternation option in place of the two alternation options that begin with "localKeyword".

No can-match is needed there either then.


Actipro Software Support

Posted 2 years ago by Will Gauthier
Avatar

Thank you for the thorough and helpful response. It's good information to have that rewriting productions is the preferred technique over employing can-match callbacks. I'll add substituting and expanding terminals, and factoring out initial keywords into a separate production, as techniques in my ambiguity-resolving toolkit.

Unfortunately, I must ask you for further help resolving a specific case of ambiguity that fixing the variableOrExpression production revealed.

As can be seen in my initial post, the second and third alternation of the statement terminal start with variableList and functionCall, respectively.

The variableList terminal unsurprisingly looks like:

variableList.Production = variable + (@comma + variable).ZeroOrMore();

And this is the functionCall terminal:

functionCall.Production = variableOrExpression + identifierAndArguments.OneOrMore();

Both variableList and functionCall indirectly start with both @identifier + variableSuffix.ZeroOrMore() and @openParenthesis + expression + @closeParenthesis. The crux of the problem seems to be that variable and variableOrExpression are nearly identical except for the fact that variable has one or more variableSuffix instead of zero or more after the @openParenthesis + expression + @closeParenthesis, which is preventing me from recombining the productions around a common term like you showed me.

Posted 2 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

You could try "widening" the allowed syntax so that you have something like:

variableListOrFunctionCall.Production = variableOrExpression + (
  (@comma + variable).ZeroOrMore()
  | identifierAndArguments.OneOrMore()
);

Then in a custom tree construction node or OnComplete callback, you could report a syntax error if it ends up being an Expression at the start but the first child alternation is has match results.


Actipro Software Support

Posted 2 years ago by Will Gauthier
Avatar

Thanks for answering my follow-up question. I was able to successfully remove all the ambiguities from my grammar and start running the parser debugger on sample code. I appreciate your help.

The latest build of this product (v24.1.1) was released 2 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.