Parsing Strings

SyntaxEditor for WPF Forum

Posted 14 years ago by Gary Ranson
Version: 11.1.0542
Avatar
I am now at a point where my Grammar is creating an AST - but have one small problem.

The parser is recognising the a string (PrimaryString). It is defined in the same way as per the Simple language demos.

The problem I have is that the Parser is not recognising the token.

I guessing that I've done something wrong!

This is a snippet from my Grammar. All other literals work, it's just the stringLiteral which is not recognised.

            var @stringLiteral = new Terminal(DinqTokenId.PrimaryStringText, "PrimaryStringText");
            
            literal.Production
                = @stringLiteral
                | @booleanTrueLiteral
                | @booleanFalseLiteral
                | @integerNumber
                | @realNumber
                ;

What am I doing wrong?

when I specify a string in the editor, the message that the Parser is returning is ...

"Parsing complete before reaching document end"

I would test it using the Parser debugger, but I can't seem to get it working. It's complaining that it can't find an parameterless constructor for the ILLParser.

Regards,

Gary Ranson.

Comments (7)

Posted 14 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Gary,

Perhaps your lexer isn't actually outputting a token whose ID is the DinqTokenId.PrimaryStringText value?

If you don't yet have error handling in your grammar, it will quit with the error you mentioned ("parsing complete before...").

As for the debugger, that will help you sort out the issues like this. The error message you see there is probably:
No non-abstract classes that implement ILLParser and have a parameterless constructor were found in the assembly.

Check your assembly to make sure that it has a class that implements ILLParser that is public, not abstract, and has a public parameterless constructor. If there is one, it should be picked up fine.


Actipro Software Support

Posted 13 years ago by Gary Ranson
Avatar
I'm still a little confused - and I thought I was doing well!!!

I've checked that the token Id's are correct for the Terminal.

In my literal production, should I be consuming the Start and End delimiters?

I've gone over and over my code - and compared to the samples. The samples don't describe how to handle strings - I know that it's going to be something obvious and simple to resolve.

            var @stringLiteral = new Terminal(DinqTokenId.PrimaryStringText, "PrimaryStringText");
            var @stringLiteralOpen = new Terminal(DinqTokenId.PrimaryStringStartDelimiter, "PrimaryStringText");

 literal.Production
                = @stringLiteral
                | @booleanTrueLiteral
                | @booleanFalseLiteral
                | @integerNumber
                | @realNumber
                | @stringLiteralOpen.OnErrorContinue()
                ;
The code above does go someway to remove the error condition, but then It's not handling the EndDelmiter token. Perhaps I need a Production to resolve a string ...

literalString.Production = @startDelimiter + <anychar>* + @endDemiliter.OnErrorContinue();

But then how do you refer to the <anychar> terminal?

Please put me out of my misery ... I'm so close!

Regards,

Gary Ranson.

[Modified at 05/01/2011 03:53 AM]
Posted 13 years ago by Gary Ranson
Avatar
It's amazing what sleep can do!

I think I may have this sorted now. I think I also need to use the TokenReader to ignore certain Tokens.

My LLParser is now doing what I expect.

I think that I may need to add some error handling to evaluate an open quote with no closing quote - but now I'm a step further.

Onwards and upwards!

Regards,

Gary.
Posted 13 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Gary,

In our .NET Languages Add-on languages we just have a single token for strings so our setup works like what you originally had.

If you have multiple possible tokens that comprise a string, you would need a non-terminal just for strings, that includes start/end delimiters, etc. You should know what token types can be inside of the delimiters so just put those in a ZeroOrMore alternation block. You could make your end delimiter in that non-terminal have OnErrorContinue() so that it's required but will allow parsing to continue if it's not there.


Actipro Software Support

Posted 7 years ago by Daisuke Nakada
Avatar

HI, I am now implementing the grammar and I have a question about PrimaryString.

My language cannot include a dollar sign( $ ) and a single quote( ' ) in its string, so I wrote the following regular expression for the PrimaryStringText in the lexer class:

lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("[^\\$']+"));

Then, when I input 'abcdef$ghijk$lmn123' in the LL Parser Debugger's Input box,
I get the following parse result:

DocumentRoot[
    constant[
        character_string[
            single_byte_character_string[
                "'"
                [
                    "abcdef"
                    "$"
                    "ghijk"
                    "$"
                    "lmn123"
                ]
                "'"
            ]
        ]
    ]
]

I expected an error would occur because my language cannot include a dollar sign( $ ).
But parse succeeded.

 

Please tell me what I am doing wrong.
Is there something wrong with the regular expression? ([^\\$']+)


lexer

// Initialize the PrimaryString lexical state
lexicalState = lexicalStates["PrimaryString"];
lexicalState.DefaultClassificationType = classificationTypeProvider.String;
lexicalState.DefaultTokenId = MyLanguageTokenId.PrimaryStringText;
lexicalState.DefaultTokenKey = "PrimaryStringText";
lexicalScope = new DynamicLexicalScope();
lexicalState.LexicalScopes.Add(lexicalScope);
lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Explicit, "PrimaryStringStartDelimiter", null);
lexicalPatternGroup.TokenId = MyLanguageTokenId.PrimaryStringStartDelimiter;
lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("\'"));
lexicalScope.StartLexicalPatternGroup = lexicalPatternGroup;
lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Regex, "PrimaryStringEndDelimiter", null);
lexicalPatternGroup.TokenId = MyLanguageTokenId.PrimaryStringEndDelimiter;
lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("[\'\\n]"));
lexicalScope.EndLexicalPatternGroup = lexicalPatternGroup;
lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Regex, "PrimaryStringText", null);
lexicalPatternGroup.TokenId = MyLanguageTokenId.PrimaryStringText;
lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("[^\\$']+"));
lexicalState.LexicalPatternGroups.Add(lexicalPatternGroup);

 grammar

var PrimaryStringStartDelimiter = new Terminal(MyLanguageTokenId.PrimaryStringStartDelimiter, "PrimaryStringStartDelimiter");
var PrimaryStringEndDelimiter = new Terminal(MyLanguageTokenId.PrimaryStringEndDelimiter, "PrimaryStringEndDelimiter");
var PrimaryStringText = new Terminal(MyLanguageTokenId.PrimaryStringText, "PrimaryStringText");

single_byte_character_string.Production = PrimaryStringStartDelimiter + PrimaryStringText.ZeroOrMore() + PrimaryStringEndDelimiter;
Posted 7 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hello,

I think it's because your lexical state has DefaultTokenId set to PrimaryStringText.  That means that any character that doesn't match one of the patterns in the lexical state will fall back to assigning DefaultTokenId (currently PrimaryStringText) over a single character token.  That would make sense with what you are seeing in the parse result too.  You would want to change the DefaultTokenId to something like PrimaryStringInvalid instead.


Actipro Software Support

Posted 7 years ago by Daisuke Nakada
Avatar

Thank you very much for helping me.
After I commented-out the code, an error occurred as I expected.

//lexicalState.DefaultTokenId = MyLanguageTokenId.PrimaryStringText;
The latest build of this product (v24.1.3) was released 1 month ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.