How to advance the textreader to the end of the non-terminal in a CanMatch

SyntaxEditor for WPF Forum

Posted 6 years ago by Nassim Farhat
Version: 18.1.0672
Avatar

Hi Team (Using: V2017.1 Build 0651),

I have a question about fixing ambiguity between 2 of my production rules using CanMatch Callbacks.

In the following CanMatch callback I have the issue that I cannot Advance the TokenReader to the end of the "non-terminal", I can only advance from token to token but I can never know when my complex variableExpression "non-terminal" finishes.

Yet, the call to "variableExpression.CanMatch(state)" actualy goes through the token reader until the end without a problem but does not return any info about the end offset or the token located at the end of the "non-terminal". This info is critical for me as I need it so that I may continue my tokenReader advancements to the correct Token.

        private bool CanMatchAssignmentStatement(IParserState state)
        {
            state.TokenReader.Push();
            try
            {
                // My variableExpression is a non-terminal
                if (variableExpression.CanMatch(state))
                {
                    // This only advances to the next token within the varaibelExpression non-terminal. 
                    // What I need is to advance to the last token at the end of the "variableExpression" 
                    // matched non-terminal.
                    state.TokenReader.Advance();

                    if (state.TokenReader.LookAheadToken.Id == StTokenId.EqualityOperatorEqual)
                    {
                        state.TokenReader.Advance();

                        if (expression.CanMatch(state))
                        {
                            return true;
                        }
                    }
                }

                return false;
            }
            finally
            {
                state.TokenReader.Pop();
            }
        } 

 

Ideally what I am looking for is a "state.TokenReader.Advance("non-terminal");" Basically saying to advance the token reader to last Token at the end of the matching non-terminal.

*Note: Just to let you know variableExpression is way to complex for me to breakup into TokenIDs and perform advance on those, so please don't propose me to advance by searching TokenIds

Thank you 

Nassim F.

[Modified 6 years ago]

Comments (4)

Posted 6 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Nassim,

The CanMatch method you mentioned only does a quick lookup to see if the current token is one that can start the symbol, or if there are can-match callback defined, it will also execute those.  It doesn't run all the way through the production to verify that it fully matches.  The intention is that you check to see if the current token (and possibly next tokens if needed) can start to match the symbol, and if so, it will start to parse that symbol, reporting errors along the way.  Can-matches aren't really meant to scan the entire production.

The parser is normaly a LL(1) (one token look-ahead) grammar but you can make it LL(*) via the can-match callbacks to scan forward as far as you need.  That being said, we unfortunately don't have any methods like what you are trying to do.

If the can-match is getting too tough in your scenario, perhaps you can restructure the assignment statement.  Here's what we do in our C# grammar:

expressionCore.Production = nonAssignmentExpression["leftexp"] + (assignmentOperator["op"] + expression["rightexp"].OnErrorContinue()).Optional()
	> AstConditional<AssignmentExpression>(AstFrom("leftexp"), AstFrom("rightexp"))
		.SetProperty(e => e.Operator, AstFrom("op"))
		.SetProperty(e => e.LeftExpression, AstFrom("leftexp"))
		.SetProperty(e => e.RightExpression, AstFrom("rightexp"));

That basically returns the core nonAssignmentExpression if there is no assignment operator right afterward.  If one is found, it will make an AssigmentExpression AST node with the nonAssignmentExpression as the left expression.


Actipro Software Support

Posted 6 years ago by Nassim Farhat
Avatar

Thank you for your reply.

Too bad I cannot tranvers a whole production rule using LL(*), inside a CanMatch :(. 

But let me describe to you my real issue. I have tried to apply your modifications to the assignement expression but it still does not workout. Maybe you can shed a little more light on me if you understand the full story.

So our language has a basic assignement statement that looks like this:

var1 := var2 + 1; OR var1 := 1;

And we have a CASE statement that looks liek this:

CASE A OF
DefWord1:
var1 := 1;
DefWord2:
var2 := 2;
END_CASE;

*DefWord1: and DefWord2: are CASES of the CASE Statement replacing literal constants, where DefWord1 and DefWord2 actualy get resolved as literal constants by our compiler. 

Here is my production rule representing my CASE Statement:

            caseStatement.Production = @conditionalStatementCase["case"]
                                       + expression["exp"]
                                       + @conditionalStatementOf["of"]
                                       + caseElement.OneOrMore().SetLabel("cases")
                                       + ((@conditionalStatementElse + statements["elsestms"]).Optional() > Ast("ELSE", AstFrom("elsestms"))).SetLabel("coucou")
                                       + @conditionalStatementEndCase["endcase"]
                                       > Ast("CaseStatement",
                                           Ast("CASE", AstFrom("exp")),
                                           Ast("OF", AstChildrenFrom("cases")),
                                           AstConditionalFrom("ELSE", "coucou"),
                                           Ast("ENDCASE"));

            caseElement.Production = caseList + (@punctuationColumn > null) + statements;

            caseList.Production = caseListElement["ele"] + ((@punctuationComma > null) + caseListElement).ZeroOrMore().SetLabel("moreele")
                                  > AstConditional("CaseList", AstChildFrom("ele"), AstChildrenFrom("moreele", 1));

            caseListElement.Production = constantExpression | nonAssignmentExpression;

The issue here is that when I get into my "caseElement.Production" so it matches the CaseList and punctuation and then starts to match "statements", but it goes TOO FAR and actualy matches "DefWord2:" as a statement instead of a "caseListElement". 

CASE A OF
DefWord1:
var1 := 1;
DefWord2:
var2 := 2;
END_CASE;

This is the reason why I wanted to use a CanMatch on the "AssignementExpression"! But even with your proposition of a "nonAssignmentExpression", "DefWord2" still gets matched as a statement instead of not-matching as I would like it to do so that It may exit the "statements" of the "caseElement.Production = caseList + (@punctuationColumn > null) + statements;" production rule and return to the "caseElement.OneOrMore().SetLabel("cases")" in order to attempt matching a new set of "caseElement" starting with a new caseList.

I almost succeeded this exercise using the CanMatch method, but alas i was blocked by the first mentionned issue in this thread.

I hope this summarizes my problem well and thank you for your reply, Actipro support team is awesome!

Nassim F.

Posted 6 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Nassim,

I see what you mean, your language grammar is very ambiguous here.  Even Visual Basic at least has a "Case" keyword that starts off each option, allowing it to be a clear end to the statement block.  Your language is tricky since there is no such keyword and the starting tokens of the next line could continue to be a statement or a case list element.

My only thought here is to do a complex can-match on the statement production for the statement block of each case element.  As you said that isn't easy though and could get pretty difficult to manually code.

Due to the ambiguity, this might be one of the rare languages that a LL parser might not handle well, and you may be better served with a bottom-up parser.


Actipro Software Support

Answer - Posted 6 years ago by Nassim Farhat
Avatar

Hi,

That's really too bad.... 

Well to tell you the thruth, right after sending you the message I did find a way to fix my issue using a simpler CanMatch on the AssignmentExpression. 

Since my CaseListElement can only start with a simpleExpression name (i.e. no variable array representation ex: defWord[3]) It can only accept straight symbolicIdentifier, and the next Token after that, it expects either a "," or a ":" as in

CASE A OF
DefWord1, DefWord2:
var1 := 1;
DefWord3:
var2 := 2;
END_CASE;

 

The following CanMatch did the job.... I was so happy when I made it work and I know I was on the right path from the beginning! Just needed to scratch the old brain a little more.

        private bool CanMatchAssignmentStatement(IParserState state)
        {
            state.TokenReader.Push();
            try
            {
                if (variableExpression.CanMatch(state))
                {
                    state.TokenReader.Advance();
                    if (state.TokenReader.LookAheadToken.Id == StTokenId.PunctuationComma ||
                        state.TokenReader.LookAheadToken.Id == StTokenId.PunctuationColumn)
                    {
                        return false;
                    }
                }
                else
                {
                    return false;
                }

                return true;
            }
            finally
            {
                state.TokenReader.Pop();
            }
        }

 

With that said... you should consider adding a production rule traversal functionality that could return the token at the end of a non-terminal in an LL(*) without impact on the reader location. This time I succeeded, but another might not in another slightly more complex scenario. For example if my CaseListElement condition was allowed to be complex! 

Best Regards and thanks again.

Nassin F.

The latest build of this product (v24.1.2) was released 2 days ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.