DefaultAstNode StartOffset

SyntaxEditor for WPF Forum

Posted 2 years ago by Daisuke Nakada
Version: 17.2.0660
Avatar

Hi, I have a question about StartOffset of DefaultAstNode.

I implemented if-statement grammar using LL(*) Parser Framework,

and the output from LL Parser Debugger looks fine,

but my StartOffset values for Expression, TrueStatement, and FalseStatement seem to be wrong.

The following is my grammar definition, Output from LL Parser Debugger, and offset values;

 

Grammar:

Root = new NonTerminal("DocumentRoot");

var statement = new NonTerminal("statement");
statement.Production = @identifier + @assignInput + @integer + @semicolon;

var expression = new NonTerminal("expression");
expression.Production = @identifier.ToTerm().ToProduction();

Root.Production = @if
	+ @openParen + expression["expression"] + @closeParen
	+ statement["true_stm"]
	+ (@else + statement["false_stm"]).Optional()
	> Ast("TestIfStatement",
		Ast("Expression", AstFrom("expression")),
		Ast("TrueStatement", AstFrom("true_stm")),
		Ast("FalseStatement", AstFrom("false_stm")));

 


Input text:

if (isTrue) x := 1; else x := 2;

Output from LL Parser Debugger:

TestIfStatement[
    Expression[
        expression[
            "isTrue"
        ]
    ]
    TrueStatement[
        statement[
            "x"
            ":="
            "1"
            ";"
        ]
    ]
    FalseStatement[
        statement[
            "x"
            ":="
            "2"
            ";"
        ]
    ]
]

Ast node offsets(from debugging with Visual Studio):

Expression StartOffset: 0 EndOffset: 32
expression StartOffset: 4 EndOffset: 10
isTrue StartOffset: 4 EndOffset: 10
TrueStatement StartOffset: 0 EndOffset: 32
statement StartOffset: 12 EndOffset: 19
x StartOffset: 12 EndOffset: 13
:= StartOffset: 14 EndOffset: 16
1 StartOffset: 17 EndOffset: 18
; StartOffset: 18 EndOffset: 19
FalseStatement StartOffset: 0 EndOffset: 32
statement StartOffset: 25 EndOffset: 32
x StartOffset: 25 EndOffset: 26
:= StartOffset: 27 EndOffset: 29
2 StartOffset: 30 EndOffset: 31
; StartOffset: 31 EndOffset: 32

Question:

Why are the StartOffset values for Expression, TrueStatement, and FalseStatement Zero,

even though they are not at the beginning of the document?

Comments (4)

Posted 2 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hello,

It's a bit complex but let me explain.  When you do a "Ast" tree constructor call, it makes the constructed DefaultAstNode have the StartOffset be the first match's start offset in the production and the EndOffset be the last match's end offset in the production, up to that point. 

When the "Expression" node is created, at that point the current production has matched "if" through ")" and that's why the offsets are set that way there.  The same concept applies to the others.  

If offsets are meaningful to you, you really only want to call the "Ast" tree constructor once per production.  So with the "Expression" example above, you could split it out into another separate production like:

var rootExpression = new NonTerminal("Expression");
rootExpression.Production = expression["expression"] > Ast("Expression", AstFrom("expression"));

...

// See "Updated" lines below
Root.Production = @if
	+ @openParen + rootExpression["expression"] + @closeParen  // Updated
	+ statement["true_stm"]
	+ (@else + statement["false_stm"]).Optional()
	> Ast("TestIfStatement",
		AstFrom("expression")),  // Updated
		Ast("TrueStatement", AstFrom("true_stm")),
		Ast("FalseStatement", AstFrom("false_stm")));

Those kinds of changes applied to the others too will fix the issue since in that scenario, only one Ast tree constructor is called per production.


Actipro Software Support

Posted 2 years ago by Daisuke Nakada
Avatar

Thank you very much for your help.

I modified my code to call "Ast" once per production and I got correct offsets for "Expression".

 

I have 2 more questions.

Question 1:

According to what you told me, do I need to define "TrueStatement" and "FalseStatement" separately like below?

var statement = new NonTerminal("statement");
statement.Production = @identifier + @assignInput + @integer + @semicolon;

var trueStatement = new NonTerminal("TrueStatement");
trueStatement.Production = statement["stm"] > Ast("TrueStatement", AstFrom("stm"));

var falseStatement = new NonTerminal("FalseStatement");
falseStatement.Production = statement["stm"] > Ast("FalseStatement", AstFrom("stm"));

Root.Production = @if
	+ @openParen + rootExpression["expression"] + @closeParen
	+ trueStatement["true_stm"]
	+ (@else + falseStatement["false_stm"]).Optional()
	> Ast("TestIfStatement",
		AstFrom("expression"),
		AstFrom("true_stm"),
		AstFrom("false_stm"));

In fact, in my actual grammar, "statement" non-terminal appears in many places, not only twice.

 

Question 2:

Please tell me how to handle "AdditiveExpression" production.

I implemented it referencing "Walkthrough: Tree Constructors" of your documentation.

To create the abstract syntax tree, it uses "AstValueOfConditional" instead of "Ast" constructor call.

I wrote this grammar but got wrong offsets for "+" operator.

Root = new NonTerminal("DocumentRoot");

var variable = new NonTerminal("Variable");
variable.Production = @identifier["identifier"] > Ast("Variable", AstFrom("identifier"));

var additiveExpression = new NonTerminal("AdditiveExpression");
additiveExpression.Production = variable["leftexp"] + ((@plus | @minus).SetLabel("op") + additiveExpression["rightexp"]).Optional()
	> AstValueOfConditional(AstChildFrom("op"), AstFrom("leftexp"), AstFrom("rightexp"));

var statement = new NonTerminal("Statement");
statement.Production = variable["var"] + @assignInput + additiveExpression["exp"] + @semicolon
	> Ast("TestStatement", AstFrom("var"), AstFrom("exp"));
 
Root.Production = statement["stm"] > Ast("Root", AstFrom("stm"));

 

Input text:

x:=a+b;

  

Output from LL Parser Debugger:

Root[
  TestStatement[
    Variable[
      "x"
    ]
    +[
      Variable[
        "a"
      ]
      Variable[
        "b"
      ]
    ]
  ]
]

 

Ast node offsets(from debugging with Visual Studio):

TestStatement StartOffset: 0 EndOffset: 7
Variable StartOffset: 0 EndOffset: 1
x StartOffset: 0 EndOffset: 1
+ StartOffset: 3 EndOffset: 6
Variable StartOffset: 3 EndOffset: 4
a StartOffset: 3 EndOffset: 4
Variable StartOffset: 5 EndOffset: 6
b StartOffset: 5 EndOffset: 6

 

StartOffset and EndOffset of "+" Operator are 3 and 6, respectively, and they are wrong.

[Modified 2 years ago]

Posted 2 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hello,

1) If you wish to wrap with "TrueStatement" then yes you would need to separate those too.  The extra "statement" node comes from the "statement" production since it has to make a node to contain all the other child nodes from there.  If you wish to eliminate that, you might be able to do something like this:

trueStatement.Production = statement["stm"] > Ast("TrueStatement", AstChildrenFrom("stm"));

2) In that case, it's by design and is the offset range of everything matched within the additive expression, so it's including the offset ranges of the "a" and "b".  An AST wouldn't work correctly if the parent node had a smaller offset range than its children.  Since in that case, recursive logic that looks for what node contains an offset (say for "a") would skip over the entire "+" node and its descendants.


Actipro Software Support

Posted 2 years ago by Daisuke Nakada
Avatar

Thank you for your explanation. I understood.

The latest build of this product (v2019.1 build 0683) was released 1 month ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.