Posted 14 years ago by Hugh - Unknown
Version: 10.2.0531
Avatar
I've implemented my language using the LLParser and it's working excellent so far. However this language is a little ambiguous.
For example, the string 'ABCD' would be interpreted as a hexadecimal number, but if a variable has been declared with the name 'ABCD', it would be interpreted as a variable instead.

Now, I can determine whether it's a hexadecimal number or a variable quite easily in my grammar, however I don't see any way to change the token produced by the lexer. Is this even possible or is there another way to do this?

The real problem here lies in the fact that the language I'm trying to parse has been poorly designed, but unfortunately that's the way it is for now.

Hugh

Comments (3)

Posted 14 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Hugh,

Unfortuantely that scenario is probably too complex for a lexer to handle.

You can probably handle it in the grammar though. There is a Custom Data property in the IParserState object that is passed to all your callbacks that can store custom data. What we suggest for complex scenarios is that you put a custom object in there that you can update in your callbacks.

In your case, you'd perhaps want to have some sort of variable table available from that property. Then any time you find a variable declaration, add it to that table.

Add two non-terminals that have the same token, one for variables and one for hex numbers. In a parent non-terminal call both of them in an alternation. This of course would be an ambiguous call since they both would stat with your same token type. However per the docs, you can resolve the ambiguity by putting a can-match callback on one of the two non-terminals, whichever is the first one in the alternation. If you apply the can-match callback to the variable one for instance, in that code you look in your table to see if the variable name is defined. If so, return true that it can match. Otherwise return false and it will fall into the hex number one later in the alternation.

That's the great thing about our design, you can easily inject custom code anywhere.


Actipro Software Support

Posted 14 years ago by Hugh - Unknown
Avatar
Thank you so much, that worked perfectly. And using the CustomData property to access my list of variables is also an excellent idea.

Just one other thing if you don't mind. The language I am parsing uses a line terminator to end a command. So I have a grammar which contains something like this (simplified):

Root.Production = scriptLine.ZeroOrMore();
scriptLine.Production = command.Optional() + @lineTerminator;

However the last line of the language is usually terminated by a DocumentEnd token, so I receive a LineTerminator expected error. I tried adding a @documentEnd terminal in an alternation with the @lineTerminator terminal, but it didn't seem to work. Although I didn't receive any errors, no AST was generated.

To solve this problem I have added an OnError callback to the @lineTerminator terminal, in which I ignore the error if the token reader is at the end of the document. Like so:

private IParserErrorResult IsDocumentEnd(IParserState state)
{
    return state.TokenReader.IsAtEnd ? ParserErrorResults.Ignore : ParserErrorResults.Default;
}

Is this the recommended way to solve this problem? In Irony you can set the NewLineBeforeEOF flag in the LanguageFlags, which prevents this situation. Perhaps you could implement a similar feature.

Thanks again.

Hugh
Posted 14 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Hugh,

Actually we ran into the document end issue as well and have added the ability to match against DocumentEnd tokens with the standard -1 token ID. Note that they are zero-width matches though.

Meaning if you had (@lineTerminator | @documentEnd) in your scriptLine production, you will go to an infinite loop at the end of the document since it wouldn't be advancing past document end and your command is optional. If your command was non-optional it wouldn't be a problem. A workaround for this issue is make your code more like this:
Root.Production = scriptLine.ZeroOrMore();
scriptLine.Production = emptyLine | commandLine;
emptyLine = @lineTerminator.ToTerm().ToProduction();
commandLine.Production = command + (@lineTerminator | @documentEnd);
These features will be in the next build. In the meantime, what you have is probably fine.


Actipro Software Support

The latest build of this product (v24.1.2) was released 14 days ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.