Parsing Strings

Comments (7)

Posted 15 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Gary,

Perhaps your lexer isn't actually outputting a token whose ID is the DinqTokenId.PrimaryStringText value?

If you don't yet have error handling in your grammar, it will quit with the error you mentioned ("parsing complete before...").

As for the debugger, that will help you sort out the issues like this. The error message you see there is probably:
No non-abstract classes that implement ILLParser and have a parameterless constructor were found in the assembly.

Check your assembly to make sure that it has a class that implements ILLParser that is public, not abstract, and has a public parameterless constructor. If there is one, it should be picked up fine.

Actipro Software Support

Posted 15 years ago by Gary Ranson

I'm still a little confused - and I thought I was doing well!!!

I've checked that the token Id's are correct for the Terminal.

In my literal production, should I be consuming the Start and End delimiters?

I've gone over and over my code - and compared to the samples. The samples don't describe how to handle strings - I know that it's going to be something obvious and simple to resolve.


            var @stringLiteral = new Terminal(DinqTokenId.PrimaryStringText, "PrimaryStringText");
            var @stringLiteralOpen = new Terminal(DinqTokenId.PrimaryStringStartDelimiter, "PrimaryStringText");

 literal.Production
                = @stringLiteral
                | @booleanTrueLiteral
                | @booleanFalseLiteral
                | @integerNumber
                | @realNumber
                | @stringLiteralOpen.OnErrorContinue()
                ;

The code above does go someway to remove the error condition, but then It's not handling the EndDelmiter token. Perhaps I need a Production to resolve a string ...


literalString.Production = @startDelimiter + <anychar>* + @endDemiliter.OnErrorContinue();

But then how do you refer to the <anychar> terminal?

Please put me out of my misery ... I'm so close!

Regards,

Gary Ranson.

[Modified at 05/01/2011 03:53 AM]

Posted 15 years ago by Gary Ranson

It's amazing what sleep can do!

I think I may have this sorted now. I think I also need to use the TokenReader to ignore certain Tokens.

My LLParser is now doing what I expect.

I think that I may need to add some error handling to evaluate an open quote with no closing quote - but now I'm a step further.

Onwards and upwards!

Regards,

Gary.

Posted 15 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Gary,

In our .NET Languages Add-on languages we just have a single token for strings so our setup works like what you originally had.

If you have multiple possible tokens that comprise a string, you would need a non-terminal just for strings, that includes start/end delimiters, etc. You should know what token types can be inside of the delimiters so just put those in a ZeroOrMore alternation block. You could make your end delimiter in that non-terminal have OnErrorContinue() so that it's required but will allow parsing to continue if it's not there.

Actipro Software Support

Posted 9 years ago by Daisuke Nakada

HI, I am now implementing the grammar and I have a question about PrimaryString.

My language cannot include a dollar sign( $ ) and a single quote( ' ) in its string, so I wrote the following regular expression for the PrimaryStringText in the lexer class:

lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("[^\\$']+"));

Then, when I input 'abcdef$ghijk$lmn123' in the LL Parser Debugger's Input box,
I get the following parse result:

DocumentRoot[
    constant[
        character_string[
            single_byte_character_string[
                "'"
                [
                    "abcdef"
                    "$"
                    "ghijk"
                    "$"
                    "lmn123"
                ]
                "'"
            ]
        ]
    ]
]

I expected an error would occur because my language cannot include a dollar sign( $ ).
But parse succeeded.

Please tell me what I am doing wrong.
Is there something wrong with the regular expression? ([^\\$']+)

lexer

// Initialize the PrimaryString lexical state
lexicalState = lexicalStates["PrimaryString"];
lexicalState.DefaultClassificationType = classificationTypeProvider.String;
lexicalState.DefaultTokenId = MyLanguageTokenId.PrimaryStringText;
lexicalState.DefaultTokenKey = "PrimaryStringText";
lexicalScope = new DynamicLexicalScope();
lexicalState.LexicalScopes.Add(lexicalScope);
lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Explicit, "PrimaryStringStartDelimiter", null);
lexicalPatternGroup.TokenId = MyLanguageTokenId.PrimaryStringStartDelimiter;
lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("\'"));
lexicalScope.StartLexicalPatternGroup = lexicalPatternGroup;
lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Regex, "PrimaryStringEndDelimiter", null);
lexicalPatternGroup.TokenId = MyLanguageTokenId.PrimaryStringEndDelimiter;
lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("[\'\\n]"));
lexicalScope.EndLexicalPatternGroup = lexicalPatternGroup;
lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Regex, "PrimaryStringText", null);
lexicalPatternGroup.TokenId = MyLanguageTokenId.PrimaryStringText;
lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("[^\\$']+"));
lexicalState.LexicalPatternGroups.Add(lexicalPatternGroup);

grammar

var PrimaryStringStartDelimiter = new Terminal(MyLanguageTokenId.PrimaryStringStartDelimiter, "PrimaryStringStartDelimiter");
var PrimaryStringEndDelimiter = new Terminal(MyLanguageTokenId.PrimaryStringEndDelimiter, "PrimaryStringEndDelimiter");
var PrimaryStringText = new Terminal(MyLanguageTokenId.PrimaryStringText, "PrimaryStringText");

single_byte_character_string.Production = PrimaryStringStartDelimiter + PrimaryStringText.ZeroOrMore() + PrimaryStringEndDelimiter;

Posted 9 years ago by Actipro Software Support - Cleveland, OH, USA

Hello,

I think it's because your lexical state has DefaultTokenId set to PrimaryStringText. That means that any character that doesn't match one of the patterns in the lexical state will fall back to assigning DefaultTokenId (currently PrimaryStringText) over a single character token. That would make sense with what you are seeing in the parse result too. You would want to change the DefaultTokenId to something like PrimaryStringInvalid instead.

Actipro Software Support

Posted 9 years ago by Daisuke Nakada

Thank you very much for helping me.
After I commented-out the code, an error occurred as I expected.

//lexicalState.DefaultTokenId = MyLanguageTokenId.PrimaryStringText;

The latest build of this product (v25.1.4) was released 4 months ago, which was after the last post in this thread.

Comments (7)

Add Comment