Need help with a lexical transition problem - SyntaxEditor Web Languages Add-on for WPF Forum

Posted 13 years ago by Craig - Varigence, Inc.

Version: 12.1.0562

Say I have a file like this:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Packages>
<# foreach(var fact in RootNode.Facts.Where(item => !string.IsNullOrEmpty(item.GetTag("AutogenerateEtl")))) { #>
        <Package Name="LoadFact<#=fact.SsisSafeName#>" ConstraintMode="Linear">
	    <Variables>
		<Variable Name="LoggingJobId" DataType="String">"Temp"</Variable>
		<Variable Name="StartDate" DataType="DateTime">01/01/2011</Variable>
            </Variables>
            <Tasks>
	        <StartLoggingJob Name="Start Logging" JobName="LoadFact<#=fact.SsisSafeName#>" LoggingConnectionName="Logging" LoggingJobVariableName="LoggingJobId"></StartLoggingJob>
 			
                <CloneTableSchema Name="Ensure Process Table Exists" LoggingConnectionName="Logging" LoggingJobVariableName="LoggingJobId" SourceConnectionName="DataWarehouse" TargetConnectionName="DataWarehouse">
		    <SourceTable TableName="<#=fact.ScopedName#>"></SourceTable>
		    <ExternalTargetTable Table="<#=fact.GetTag("ProcessTable")#>"></ExternalTargetTable>
                </CloneTableSchema>
				
		<ExecuteSQL Name="Truncate Process Table" ConnectionName="DataWarehouse">
		    <DirectInput>TRUNCATE TABLE <#=fact.GetTag("ProcessTable")#></DirectInput>
                </ExecuteSQL>			
            </Tasks>
        </Package>
<# } #>
    </Packages>
</Biml>

The <# and <#= tags are language transitions, using the Syntax Editor's language transition logic. We have intellisense and quick info working really well for this, but have hit a snag.

When I type a < in the line directly above the ExecuteSQL tag, I should see an intellisense pop-up but I don't. I know this is because my XmlContext has an incorrect hierarchy of: Biml / Packages / Package / Tasks / CloneTableSchema / ExternalTargetTable

I also know that when I go into the ExternalTargetTable element and remove the double quotes around the ProcessTable string, intellisense starts working again. Furthermore, if I add at least a single letter after the end parenthesis, intellisense works and the element hierarchy is correct.

My suspicion is there's an issue where the end quote within the <#=fact.GetTag("ProcessTable")#> nugget is being treated as an end attribute value, or at least not being processed corectly.

After reviewing the lexical transition sample in the Sample Browser, I believe the issue has to do with how I'm creating the lexical transition. Our methods to set up lexical transitions are as follows:

        private void SetupLexicalTransitions()
        {
            var bimlScriptClassificationType = CreateBimlScriptClassificationType();
            var dotNetLexer = _dotNetSyntaxLanguageBase.GetLexer() as IMergableLexer;
            var bimlLexer = _xmlLexer as DynamicLexer;
            var directiveLexer = _bimlScriptDirectiveSyntaxLanguage.GetLexer() as DynamicLexer;

            if (bimlLexer != null && dotNetLexer != null && directiveLexer != null)
            {
                directiveLexer.Key = "BimlScriptDirective";
                using (IDisposable batch = bimlLexer.CreateChangeBatch())
                {
                    var codeNuggetLexicalState = new DynamicLexicalState(0, "BimlScriptCodeNugget") { DefaultTokenKey = "BimlScriptCodeNuggetText" };
                    bimlLexer.LexicalStates.Add(codeNuggetLexicalState);

                    // Insert the transition lexical state at the beginning of the parent language's default state's child states list so that it has top matching priority
                    bimlLexer.DefaultLexicalState.ChildLexicalStates.Insert(0, codeNuggetLexicalState);

                    InsertBimlScriptLexicalState(bimlLexer, codeNuggetLexicalState, true);

                    // Create the lexical scope for the transition lexical state
                    var codeNuggetLexicalScope = new DynamicLexicalScope();
                    codeNuggetLexicalScope.IsAncestorEndScopeCheckEnabled = false;
                    codeNuggetLexicalState.LexicalScopes.Add(codeNuggetLexicalScope);
                    codeNuggetLexicalScope.StartLexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Explicit, "BimlScriptStartDelimiter", bimlScriptClassificationType);
                    codeNuggetLexicalScope.StartLexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern(@"<#="));
                    codeNuggetLexicalScope.StartLexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern(@"<#+"));
                    codeNuggetLexicalScope.StartLexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern(@"<#"));
                    codeNuggetLexicalScope.EndLexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Explicit, "BimlScriptEndDelimiter", bimlScriptClassificationType);
                    codeNuggetLexicalScope.EndLexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern(@"#>"));

                    // Set up a direct transition on the lexical state so that when it is entered, it will transition directly to the child language's default lexical state
                    codeNuggetLexicalState.Transition = new LexicalStateTransition(_dotNetSyntaxLanguageBase, dotNetLexer.DefaultLexicalState, null);

                    var directiveLexicalState = new DynamicLexicalState(0, "BimlScriptDirective") { DefaultTokenKey = "BimlScriptDirectiveText" };
                    bimlLexer.LexicalStates.Add(directiveLexicalState);

                    // Insert the transition lexical state at the beginning of the parent language's default state's child states list so that it has top matching priority
                    bimlLexer.DefaultLexicalState.ChildLexicalStates.Insert(0, directiveLexicalState);

                    InsertBimlScriptLexicalState(bimlLexer, directiveLexicalState, false);

                    // Create the lexical scope for the transition lexical state
                    var directiveLexicalScope = new DynamicLexicalScope();
                    directiveLexicalScope.IsAncestorEndScopeCheckEnabled = false;
                    directiveLexicalState.LexicalScopes.Add(directiveLexicalScope);
                    directiveLexicalScope.StartLexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Explicit, "BimlScriptDirectiveStartDelimiter", bimlScriptClassificationType);
                    directiveLexicalScope.StartLexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern(@"<#@"));
                    directiveLexicalScope.EndLexicalPatternGroup = new DynamicLexicalPatternGroup(DynamicLexicalPatternType.Explicit, "BimlScriptDirectiveEndDelimiter", bimlScriptClassificationType);
                    directiveLexicalScope.EndLexicalPatternGroup.Patterns.Add(new DynamicLexicalPattern(@"#>"));

                    // Set up a direct transition on the lexical state so that when it is entered, it will transition directly to the child language's default lexical state
                    directiveLexicalState.Transition = new LexicalStateTransition(_bimlScriptDirectiveSyntaxLanguage, directiveLexer.DefaultLexicalState, null);
                }
            }
        }

        private static void InsertBimlScriptLexicalState(DynamicLexer bimlLexer, DynamicLexicalState dynamicLexicalState, bool isCodeNugget)
        {
            foreach (var lexicalState in bimlLexer.LexicalStates)
            {
                var isAttributeValue = lexicalState.DefaultClassificationType == new XmlClassificationTypeProvider().XmlAttributeValue;
                var isCodeNuggetInStartTag = isCodeNugget && lexicalState.Key == "StartTag";

                if (isAttributeValue || isCodeNuggetInStartTag)
                {
                    lexicalState.ChildLexicalStates.Insert(0, dynamicLexicalState);

                    if (isAttributeValue)
                    {
                        // Add < to the lexicalState's lexicalPattern so that, within attribute values, we recognize < as a delimiter.
                        if (lexicalState.LexicalPatternGroups != null && lexicalState.LexicalPatternGroups.Count > 0)
                        {
                            var lexicalPatternGroup = lexicalState.LexicalPatternGroups[0];
                            if (lexicalPatternGroup.Patterns != null && lexicalPatternGroup.Patterns.Count > 0)
                            {
                                var lexicalPattern = lexicalPatternGroup.Patterns[0];
                                if (lexicalPattern != null)
                                {
                                    lexicalPattern.Pattern = "[^\\\"\\>\\<]+";
                                }
                            }
                        }
                    }
                }
            }
        }

I'm somewhat convinced our lexical state logic within the CreateBatch is where the problem lies. Assuming you agree with my diagnosis, do you see if there's a bug in the lexical transition logic that's causing this issue? Could a different problem cause these symptoms?

Note that I tried this with the latest WPF Studio and the issue still reproes.

Any help or thoughts would be greatly appreciated.

Thanks,

-Craig

[Modified 13 years ago]

Comments (8)

Posted 13 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Craig,

It's hard to say but it sounds like something is messing up the way the document is being parsed. It could be the quotes as you suspect. One way to look further into it is to just open the lexer in our SDI Editor and turn on the token information display (I think it's an option on the View menu). That will tell you the token at the caret. Move it around near your embedded <%= portion and see if the quotes within look like they are matching to the correct tokens or not.

One thing I could see happening is that since parent scope end patterns outside the current lexer have top priority, perhaps the quote is causing the ancestor state to exit. You can see the "Dynamic Lexers" documentation topic, and within that the "Lexing Sequence" section for more info. It also talks about a property you can set to false to prevent those ancestor lookups at a certain level. Perhaps that would help you if this is the case.

Actipro Software Support

Posted 13 years ago by Craig - Varigence, Inc.

I tried the token technique you suggested but all the tokens appear correct. Also, I already have IsAncestorEndScopeCheckEnabled set to false.

When I reviewed the AST, however, I was able to confirm that the AST is indeed wrong in this scenario.

The AST around the CloneTableSchema should look like:

Element[
    "CloneTableSchema"
    Attributes[
        "Name"
        "LoggingConnectionName"
	"LoggingJobVariableName"
	"SourceConnectionName"
	"TargetConnectionName"
    ]
    Nodes[
	Element[
		"SourceTable"
		Attributes[
			"TableName"
		]
		"Nodes"
		"EndTag"
	]
	Element[
		"ExternalTargetTable"
		Attributes[
			"Table"
		]
		"Nodes"
		"EndTag"
	]
   ]
  "EndTag"
]
Element[
  "ExecuteSQL"
   Attributes[
	"Name"
	"ConnectionName"
    ]
    Nodes[
	Element[
	       "DirectInput"
		"Nodes"
		"EndTag"
	]
    ]
    "EndTag"
]

However, it's actually:

Element[
	"CloneTableSchema"
	Attributes[
		"Name"
		"LoggingConnectionName"
		"LoggingJobVariableName"
		"SourceConnectionName"
		"TargetConnectionName"
		]
		Nodes[
			Element[
				"SourceTable"
				Attributes[
					"TableName"
				]
				"Nodes"
				"EndTag"
			]
			Element[
				"ExternalTargetTable"
				Attributes[
					"Table"
				]
				Nodes[
					Element[
						"DirectInput"
						"Nodes"
						"EndTag"
					]
				]
			""
		]
	]
	""
]

Notice that after the ExternalTargetTable's Table attribute, the AST believes the ExternalTargetTable node has a child node that's supposed to be the child node of the ExecuteSQL node.

1. Given that the tokens are correct but the AST is wrong, am I correct in assuming my problem would actually be in the parser, as opposed to in the language transition logic?

2. I've been trying to turn on intellisense in the language transitions sample in the Sample Browser so I can experiment with a simpler repro case. However, despite adding this code in its MainControl code-behind, intellisense never activates:

            XmlSchemaResolver schemaResolver = new XmlSchemaResolver();
            using (Stream stream = Assembly.GetExecutingAssembly().GetManifestResourceStream(SyntaxEditorHelper.XmlSchemasPath + "Mammals.xsd"))
            {
                schemaResolver.LoadSchemaFromStream(stream);
            }

            directiveEditor.Document.Language.RegisterXmlSchemaResolver(schemaResolver);

What can I do to activate some simple XML intellisense in that sample?

Thanks again,

-Craig

[Modified 13 years ago]

Posted 13 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Craig,

1) Yes it's probably the parser then. I think you have the source for the add-on, don't you? Did you use the default parser grammar or have you modified it to accommodate your additional syntax? You probably do need to modify it since right now the parser is coded to expect start tag attribute tokens in a certain sequence. When it encounters your other ones, it most likely falls into an error state. You could either modify the parser grammar to allow your server tags, or you could use a modified XmlTokenReader that filters all those other server tags from ever reaching the parser.

2) I'd think what you have would work, as long as you have our Web Languages Add-on's XML language loaded. It doesn't load it by default and loads the free XML language instead. Our WebAddonXmlEditor demo has it all working though.

Actipro Software Support

Posted 13 years ago by Craig - Varigence, Inc.

I do have the source for an older version of the add-on. I gave it a shot and I see what you're talking about in the token reader. When an opening server tag (such as <#) is encountered, the token reader recognizes the < symbol as its own start tag.

In terms of modifying the XmlTokenReader to filter those server tags, I tried creating my own token reader that extends XmlTokenReader and overriding GetNextToken. I thought I'd be able to see <# as its own token and recognize it as a script tag. Then, I'd keep ignoring tokens until its end token. That way, the next token that'd be returned would be the first token following the closing script tag.

However, it seems that <# never appears as it's own token; I just get its < symbol as a token. Further, it seems that by the time I'm inside GetNextToken, the token list has already been generated and thus already has errors. So, even if I can filter the server tags (and the C# content between them), it seems like the following tokens will still be wrong.

1. Is that correct or am I misunderstanding your suggestion for the modifying the token reader?

2. Regarding editing the grammar, we're using the regular XML grammar. Is there an Actipro tool and/or instructions of modifying the XML grammar to support our script tags? Also, what would the script tags do in the grammar? Simply be treated as their own delimiters and then ignore what's between them?

Thanks,

-Craig

[Modified 13 years ago]

Posted 13 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Craig,

So what you did is correct so far. You do want to use the token reader to prevent tokens the XML parser doesn't know about from reaching the parser. However as you saw, the default lexer only parses XML tags. So you would need modify our lexers so that you can have the token tagger skip over your custom tokens.

We don't have any samples of doing this but actually in the Web Languages Add-on we use the same XML language definition as the one that comes free in the Sample Browser, You may be able to modify that language project to add your own tokens and then regenerate a new programmatic lexer and register it on the language in place of our default lexer.

Actipro Software Support

Answer - Posted 13 years ago by Craig - Varigence, Inc.

Thanks again for your continued replies. Fortunately, I think I found a fix to this issue.

For our custom language, we implemented a custom parser that implements IParser. Within its Parse method, we actually perform two parses: one for the XML and one for the CSharp. To perform the XML parse, we were invoking new XmlParser().Parse(), and the XmlParser would create its own token reader. However, it uses a XmlTokenReader with a XmlLexer, which produces two problems.

First, the built-in XmlLexer doesn’t contain the lexical transitions that I added to my custom syntax language's lexer. Second, the XmlTokenReader will return CSharp tokens, and my custom tokens, when its GetNextToken() method is invoked, instead of not passing them to the parser.

My solution was to subclass XmlTokenReader and override its GetNextToken method so it would skip non-XML tokens. Then, I subclassed XmlParser and overrode its CreateTokenReader method to pass in my custom language's lexer to my custom token reader. Finally, with my parser, I know invoke the subclassed parser's Parse method.

The custom token reader's overriden GetNextToken method looks like:

        protected override IToken GetNextToken()
        {
            IToken token = base.GetNextToken();

            // Loop to skip over tokens that are insignificant to the parser
            while (!this.IsAtEnd)
            {
                if (new XmlTokenId().ContainsId(token.Id))
                {
                    switch (token.Id)
                    {
                        case XmlTokenId.Default:
                        case XmlTokenId.Entity:
                        case XmlTokenId.Identifier:
                        case XmlTokenId.StartTagText:
                        case XmlTokenId.Whitespace:
                            // Skip
                            token = base.GetNextToken();
                            break;
                        default:
                            return token;
                    }
                }
                else if (token.Id < 0)
                {
                    // Apparently, tokens with a -1 Id are possible; return them.
                    return token;
                }
                else
                {
                    // Skip C# and language transition delimiters
                    // since the XmlParser won't understand them.
                    token = base.GetNextToken();
                }
            }

            return token;
        }

I realize this doesn't incorporate your previous suggestion of modifying the lexer to impact the token tagger, but my approach does seem to solve the problem. Based on this description, does this seem like a correct solution or might I be missing something?

Posted 13 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Craig,

Yes that sounds like it will probably work out fine.

Actipro Software Support

Posted 13 years ago by Craig - Varigence, Inc.

Great. Thanks once more for your help.

The latest build of this product (v25.1.0) was released 29 days ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.