Token.Modified Property

SyntaxEditor for Windows Forms Forum

Posted 18 years ago by Marianne
Avatar
Presumably, this property provides a boolean that indicates whether the token has been modified since the last lexical parse. I'm having some difficulty understanding how this works. If I have the following code in my semantic parser:
            if (modification.HasFlag(DocumentModificationFlags.ProgrammaticTextParse))
            { ...
            }
            else
            {
                if (modification.Type != DocumentModificationType.Typing) return;
                TokenStream ts = document.GetTokenStream(modification.StartOffset);
                Token t = ts.Read();
                while (!ts.IsPastEnd)
                {
                    if(t.Modified)
                        Console.WriteLine(document.GetTokenText(t));
                    t = ts.Read();
                }
            }
And I modify a single token via a keypress, I would assume only that Token would be processed with this code. Based on my test, not only does it include unchanged tokens in the code, but it does NOT contain the token that I modified via a keypress. Here is my sample script:
 Dim test, test2, test3

Set oFirstNode = GetObject(RootNodePath)

If Err <> 0 Then
    Display "Couldn't get the first node!"
    WScript.Quit (1)
End If
 
' Begin displaying tree
Call DisplayTree(oFirstNode, 0)
The only change I make is to type a '1' at the end of the Dim line so that it reads:
Dim test, test2, test31

I would expect that I would only see that the test31 token would be returned but the result I get is:
    WScript.Quit (1)
End If
So I guess I don't understand two things. First, why is it returning tokens that have NOT changed since the last lexical parse? Second, why is it NOT returning the one token that I have in fact modified? I'm sure I just don't understand how the semantic parsing works, and after seeing these results I'm guessing that it doesn't work at all the way I thought it did.

------------------------------- Marianne

Comments (6)

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Marianne,

If you look at DocumentModification.LexicalParseOffsetRange, that will indicate the range of offsets that were changed via a lexical parse. So restrict your token search to only tokens in that offset range. I think in your scenario above, the lexical parse range was starting before your modification start, which is why you didn't catch the changed token. Also, for speed optimizatio reasons, we only updates the tokens that fall in the LexicalParseOffsetRange. So tokens outside of that range may say they were modified on the last parse even though they weren't. If we updated every token on every minor document change, that could kill performannce on large documents, which is why we do things that way.

Just FYI, with the cool work we're doing on SyntaxEditor 4.0, you now will have a better ability for doing advanced semantic parsing. If you study how major editors do it, they generally build an abstract syntax tree of the document. So for instance, in C#, you'd have one root CompilationUnit node, with Namespace nodes under it, and Class nodes under those, etc. That is the proper way to do semantic parsing and that sort of thing will be one of the major new features in 4.0.


Actipro Software Support

Posted 18 years ago by Marianne
Avatar
I've modified my code as follows:
                if (modification.Type != DocumentModificationType.Typing) return;
                TokenStream ts = document.GetTokenStream(modification.LexicalParseOffsetRange.Min);
                Token t = ts.Read();
                while (t.EndOffset < modification.LexicalParseOffsetRange.Max)
                {
                    if(t.Modified)
                        Console.Write(document.GetTokenText(t));
                    t = ts.Read();
                }
The results I get are puzzling. If I set a breakpoint at the While loop, these are the results I get:
LexicalParseOffsetRange.Min = 24
LexicalParseOffsetRange.Max = 51
Token.StartOffset = 91

The result of those numbers is that the code NEVER breaks into the while loop. It's as if the entire document magically jumps past any attempt to get at the lexicalparseoffsetrange.

Calling GetTokenStream with an offset of 24 should return the first token at offset 24 when Read() is called. The results I'm getting are that the Read() method is actually getting the token at offset 91. Each change makes me more confused than before. All I'm trying to do is get a list of tokens that have changed since the last lexical parse. This is getting frustrating.

[Modified at 03/06/2006 12:53 PM]

------------------------------- Marianne

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
No a TextStream creation accepts an offset parameter. A TokenStream creation accepts a token index parameter. You are passing an offset in where a token index should be passed. That's your problem there.


Actipro Software Support

Posted 18 years ago by Marianne
Avatar
Thanks! That did it. Also, can you provide some more information on this abstract syntax tree. I'm assuming it will replace or at least modify the existing language definition so that you can define the conditions under which a token is granted a specific ID. In other words, it will allow for a means of linking language elements together in a hierarchical linked list i.e. a tree.

First of all, is my assumption correct? Second, how difficult and time consuming will it be to generate these trees for each language construct? You're working on C# now, but is that to generate the most stream-lined method of doing so, or is it so complex that it will take a large amount of time to generate a 'SemanticParsingTree' for each language that is to be supported?

------------------------------- Marianne

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Marianne,

I believe we posted some information about what we've done so far with the AST in some other threads in this forum. Check that out for some general info.

What the new stuff does is allows you to replace SyntaxEditor's default "dynamic" lexical parser with one of your own. It also allows you to make languages more of a "plug-in" where you can now create a class that inherits SyntaxLanguage and has all the code in it to do lexical parsing, semantic parsing, intellisense, etc. This will keep your forms' code cleaner since most editor-related code will now be moved to the language.

I don't want to give too much away in public (we can email it to you though) but we have facilities now for helping you make an AST. You still need to code the grammar for the tree yourself though. This can be time-consuming. We are currently doing the C# spec. As you can see there are a ton of constructs to handle and we make an AST node for almost each one.

This sort of thing is going to be intended for advanced programmers only because it can get a little complex. However, once you do it and do it properly, it will allow you to truly make a very flexible editor with pretty much all the features found in a major IDE.

We are doing C# right now to prototype out how this all should work. C# is a very complex language, so starting with something like Javascript probably would have been a better idea. However with C#, we do have to focus on scoping, type resolution, and can work on reflection repositories so actually although it's more painful right now, C# is probably a good starting language to prototype with this system. The next one we do should probably be a tag-based language like XML or HTML.


Actipro Software Support

Posted 18 years ago by Joris Koster - Paragon Decision Technology
Avatar
For the more advanced programmers that would like to build these AST tree themselves or just as a reference, the link below points to a very nice site with complete semantic and syntactic specifications of various languages (C/C++, Java, Cobol, Algol, Perl, PHP, etc. unfortunately not C# :( ).

Samples of MATHS

Aswell maybe of interest, another AST generator, ANTLR, that actually generates source code that, once compiled, builds the AST of the specified language according to your inputstream.

cheers,
Joris
The latest build of this product (v24.1.0) was released 5 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.