How do you correctly handle strings with embedded escaped quotes?

SyntaxEditor for WPF Forum

Posted 9 years ago by Scott Haney
Version: 14.2.0611
Avatar

Hi,

I am trying to parse a language that allows strings with embedded quotes represented by C-style escapes:

"Embedded \"quote\""

Largely by pattern matching, I have created a programmable lexer with the following code:

            // Initialize the PrimaryString lexical state.

            lexicalState = lexicalStates["PrimaryString"];
            lexicalState.DefaultClassificationType = classificationTypeProvider.String;
            lexicalState.DefaultTokenId = NXSLTokenId.StringText;
            lexicalState.DefaultTokenKey = "StringText";
            DynamicLexicalScope lexicalScope = new DynamicLexicalScope();
            lexicalState.LexicalScopes.Add(lexicalScope);

            lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Explicit, "StringStartDelimiter", null)
            {
                TokenId = NXSLTokenId.StringStartDelimiter
            };
            lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("\""));
            lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("#\""));
            lexicalScope.StartLexicalPatternGroup = lexicalPatternGroup;

            lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Explicit, "StringEndDelimiter", null)
            {
                TokenId = NXSLTokenId.StringEndDelimiter
            };
            lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("\""));
            lexicalScope.EndLexicalPatternGroup = lexicalPatternGroup;

            lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Regex, "StringLineTerminator", null)
            {
                TokenId = NXSLTokenId.StringLineTerminator
            };
            lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("\\n"));
            lexicalState.LexicalPatternGroups.Add(lexicalPatternGroup);

            lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Explicit, "StringEscapedDelimiter", null)
            {
                TokenId = NXSLTokenId.StringEscapedDelimiter
            };
            lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("\\\""));
            lexicalState.LexicalPatternGroups.Add(lexicalPatternGroup);

            lexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Regex, "StringText", null)
            {
                TokenId = NXSLTokenId.StringText
            };
            lexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern("[^\\\"\\\\\\n]+"));
            lexicalState.LexicalPatternGroups.Add(lexicalPatternGroup);

 It is quite possible that my mistake is in the grammar, which looks like this:

            var @openQuote  = new Terminal(NXSLTokenId.StringStartDelimiter, "OpenQuote")   { ErrorAlias = "'\"'" };
            var @string     = new Terminal(NXSLTokenId.StringText,           "StringText");
            var @closeQuote = new Terminal(NXSLTokenId.StringEndDelimiter,   "CloseQuote")  { ErrorAlias = "'\"'" };

            primaryExpression.Production = idExpression
                | builtinExpression
                | parenExpression
                | curlyExpression
                | @openQuote + @string.Optional() + @closeQuote
                | @real
                | @integer
                | @false
                | @true;

Unfortunately, I get an error every time I use a \" in a string. The error is '"': expected.

I would greatly appreciate any suggestions regarding possible misuses of the grammar and/or lexer.

Thanks in advance,

Scott Haney

Comments (3)

Posted 9 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Scott,

It looks like you are creating other tokens here like StringEscapedDelimiter, but then your grammar doesn't allow for it.  Your grammar just has @openQuote + @string.Optional() + @closeQuote.  it probably needs to be something more like:

@openQuote + (@string | @stringLineTerminator | @stringEscapedDelimiter).ZeroOrMore() + @closeQuote

Either that or modify your lexer to produce a single token for the entire string.


Actipro Software Support

Posted 9 years ago by Scott Haney
Avatar

Thank you so much. That fixed it.

If it is not too much trouble, can you point me to a some example lexer code that sets up the string as a single token? (Actually, just the single regex string would be fantastic - I am getting hung up with backslashes required by C# and the regex language).

Again, I really appreciate the help.

Scott

Posted 9 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Scott,

Are you writing the dynamic lexer code by hand?  I wouldn't recommend that since as you said, C# and regex escapes can get really confusing.  You can use our Language Designer app to build a dynamic lexer definition and output the code for you.  With that, you wouldn't need to think about escapes except for those in the regex pattern itself.

If you search on the web, you can find single token regex patterns like the one I found here:

http://stackoverflow.com/questions/4953737/regex-for-matching-c-sharp-string-literals


Actipro Software Support

The latest build of this product (v24.1.1) was released 2 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.