Changing/resetting state

SyntaxEditor for Windows Forms Forum

Posted 18 years ago by Russell Mason
Avatar
Hi

Is it possible to end a state when another state is started?

E.g. SELECT[Id]FROM[MyTable]SELECT[Id]from[MyTable]

This is valid SQL but because there is no whitespace, there are no nice convenient start and end markers to start and end state scope

Ideally I would like to:
1) Start in DefaultState
2) Enter a SelectState after SELECT
3) Enter a SquareStringState after [ (Child state of SelectState)
4) Re-enter SelectState after ]
5) Enter FromState after FROM (Child state of SelectState)
6) Enter a SquareStringState after [ (Child state of SelectState)
7) Enter DefaultState after ]
8) Repeat for second statement

The problem is I end up in SelectState after the ] and have no way of getting back to DefaultState because there are no characters to use as the EndScope before starting SELECT again.

Any ideas, or am I taking the wrong approach? I could manually maintain state in code but it seems a shame to write my own state mechanism when you have one built in.

Thanks
Russell Mason

Comments (5)

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Yeah I see the problem. The way we currently have the lexical parser set up, this isn't easy to do. You can do everything up to the point where the SELECT statement ends because, as you said, there is no specific end token. We'll try and keep this example for when we do some hard looking at improving the parsing engine for 4.0.


Actipro Software Support

Posted 18 years ago by Russell Mason
Avatar
Hi

Does your new 4.0 parser have a way to cope with this situation now?

Thanks
Russell Mason
Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
While v4.0 does have a lot more flexibility in terms of lexical parsing because you can built programmatic lexical parsers instead of using "dynamic" languages (v3.1 style languages), I'm still not sure you can do what you want.

We've worked on an advacned SQL language a little bit and SQL is one of the most ambiguous languages out there. The reason of course is that there never is any end delimiter for a statement. The lexical parser should simply be highlighting keywords, strings, etc. The semantic parser (in v4.0) is where you can examine the tokens to build an AST of the SQL code. That is where you would be starting an AST node of the SELECT statement and then using grammar rules to determine where it ends.

However in a lexical parser, i'm not so sure you want to use states with how you intended. I would not transition to a SELECT state when you see the SELECT keyword and just keep it in the default state. In v4.0 if you build an AST of the document, then you can use that to aid in populating IntelliPrompt. It's complicated stuff but it's how all major IDE's do it. We are working on an advanced SQL language implementation but it may be a little ways off yet since we want get out a couple other advanced languages first.


Actipro Software Support

Posted 16 years ago by David Chang - Software Engineer, a.i. solutions, Inc.
Avatar
I have ran into the same problem.

In our scripting language, we can create macros

Define Macro mac1 ( id, id, ... );

// Macro content

EndMacro;

In the dynamic language definition I used 'Macro' to move from Default state to Macro state and used ';' to move from Macro state to MacroBody state. To get out of them I used 'EndMacro' to go from MacroBody to Macro and used ';' to go from Macro to Default.
   <States>
      <State                                             Key="CommentState"                  TokenKey="CommentDefaultToken"         Style="CommentDefaultStyle">
         <Scopes>
            <Scope  BracketHighlight="True">
               <RegexPatternGroup      Type="StartScope"                                     TokenKey="CommentStartToken"           Style="CommentDelimiterStyle" PatternValue="//(?!((Block)|(EndBlock))" />
               <RegexPatternGroup      Type="EndScope"                                       TokenKey="CommentEndToken"             Style="CommentDelimiterStyle" PatternValue="{LineTerminatorMacro}"                                                                                                                     IsWhitespace="True" />
            </Scope>
         </Scopes>
         <PatternGroups>
            <ExplicitPatternGroup                                                            TokenKey="CommentDelimiterToken"       Style="CommentDelimiterStyle" PatternValue="//" />
            <RegexPatternGroup                                                               TokenKey="CommentWhitespaceToken"                                    PatternValue="{WhitespaceMacro}+"                                                                                                                        IsWhitespace="True" />
            <RegexPatternGroup                                                               TokenKey="CommentLineTerminatorToken"                                PatternValue="{LineTerminatorMacro}"                                    LookAhead="{LineTerminatorWhitespaceMacro}* //(?!((Block)|(EndBlock)) [^/]"      IsWhitespace="True" />
            <RegexPatternGroup                                                               TokenKey="CommentWordToken"                                          PatternValue="\w+" />
            <RegexPatternGroup                                                               TokenKey="CommentDefaultToken"                                       PatternValue="{NonLineTerminatorMacro}" />
         </PatternGroups>
      </State>
      <State                                             Key="MacroState">
         <Scopes>
            <Scope  BracketHighlight="True">
               <ExplicitPatternGroup   Type="StartScope"                                     TokenKey="MacroStartToken"             Style="CommandKeyWord"        PatternValue="Macro"          LookBehind="^|[^\.]"                      LookAhead="{NonWordMacro}"                         CaseSensitivity="Sensitive" />
               <ExplicitPatternGroup   Type="EndScope"                                       TokenKey="MacroEndToken"               Style="OperatorStyle"         PatternValue=";" />
            </Scope>
         </Scopes>
         <PatternGroups>
            <ExplicitPatternGroup                        Key="Macro"                         TokenKey="Macro"                       Style="CommandKeyWord"        PatternValue="Macro"          LookBehind="^|[^\.]"                      LookAhead="{NonWordMacro}"                         CaseSensitivity="Sensitive" />
            <ExplicitPatternGroup                        Key="EndMacro"                      TokenKey="EndMacro"                    Style="CommandKeyWord"        PatternValue="EndMacro"       LookBehind="^|[^\.]"                      LookAhead="{NonWordMacro}"                         CaseSensitivity="Sensitive" />
            <RegexPatternGroup                                                               TokenKey="MacroLineTerminatorToken"                                  PatternValue="{LineTerminatorMacro}"                                    LookAhead="{LineTerminatorWhitespaceMacro}*"                                     IsWhitespace="True" />
            <RegexPatternGroup                                                               TokenKey="MacroWhitespaceToken"                                      PatternValue="{WhitespaceMacro}+"                                                                                                                        IsWhitespace="True" />
            <RegexPatternGroup                                                               TokenKey="MacroIdentifierToken"                                      PatternValue="(_ | {AlphaMacro})({WordMacro})*" />
            <ExplicitPatternGroup                                                            TokenKey="MacroOpenParenthesisToken"                                 PatternValue="("                                                                                                                                         EndBracket="CloseParenthesisPatternGroup" />
            <ExplicitPatternGroup                                                            TokenKey="MacroCloseParenthesisToken"                                PatternValue=")"                                                                                                                                         StartBracket="OpenParenthesisPatternGroup" />
         </PatternGroups>
         <ChildStates>
            <ChildState                                  Key="CommentState" />
            <ChildState                                  Key="MacroBodyState" />
         </ChildStates>
      </State>
      <State                                             Key="MacroBodyState">
         <Scopes>
            <Scope  BracketHighlight="True">
               <ExplicitPatternGroup   Type="StartScope"                                     TokenKey="MacroBodyStartToken"          Style="OperatorStyle"         PatternValue=";" />
               <ExplicitPatternGroup   Type="EndScope"                                       TokenKey="MacroBodyEndToken"            Style="CommandKeyWord"        PatternValue="EndMacro"      LookBehind="^|[^\.]"                      LookAhead="{NonWordMacro}"                         CaseSensitivity="Sensitive" />
            </Scope>
         </Scopes>
         <PatternGroups>
            <ExplicitPatternGroup                        Key="EndMacro"                      TokenKey="EndMacro"                    Style="CommandKeyWord"        PatternValue="EndMacro"       LookBehind="^|[^\.]"                      LookAhead="{NonWordMacro}"                         CaseSensitivity="Sensitive" />
            <RegexPatternGroup                                                               TokenKey="MacroBodyWhitespaceToken"    Style="StringDefaultStyle"    PatternValue="{WhitespaceMacro}+"                                                                                                                        IsWhitespace="True" />
            <RegexPatternGroup                                                               TokenKey="MacroBodyLineTerminatorToken" Style="StringDefaultStyle"   PatternValue="{LineTerminatorMacro}"                                    LookAhead="{LineTerminatorWhitespaceMacro}* //(?!((Block)|(EndBlock)) [^/]"      IsWhitespace="True" />
            <RegexPatternGroup                                                               TokenKey="MacroBodyWordToken"          Style="StringDefaultStyle"    PatternValue="\w+" />
            <RegexPatternGroup                                                               TokenKey="MacroBodyDefaultToken"       Style="StringDefaultStyle"    PatternValue="{NonLineTerminatorMacro}" />
         </PatternGroups>
         <ChildStates>
            <ChildState                                  Key="CommentState" />
         </ChildStates>
      </State>
I cannot get out of MacroBody state because I wanted to match any token that has the pattern "[.\w\s]+" within the MacroBody of which 'EndMacro' is one of them. So I think I would need to somehow transition out of this by looking at the tokens in semantic parsing.

So my question is: what can I do to tell the dynamic language to change state within semantic parser code?
Posted 16 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
David,

Can you perhaps email over your dynamic language XML definition and include at the bottom some example text that shows the situation and how it is not parsing the way you want? Then we can look at that. Thanks!


Actipro Software Support

The latest build of this product (v24.1.0) was released 2 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.