SyntaxEditor 2.0 Design Feedback

SyntaxEditor for Windows Forms Forum

Posted 16 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
We are looking for some creative feedback on how to complete the remaining features for 2.0.

Right now 2.0 is mostly finished other than finalizing semantic parsing and outlining. 2.0 has a ton of redesigned features and improvements. A beta will be made available soon however we do want to get some of your input first before we release a beta.

Here are some of the major changes to 2.0, although there are a very large list of things that were changed:
    <li>Redesigned XML language definition format
    <li>Instant-loading XML language definitions
    <li>New NFA regex engine supporting more constructs
    <li>Ability to completely modify languages at run-time
    <li>Lexical parser that parses all text into tokens
    <li>Ability to easily integrate semantic parsing
    <li>Line modification markers like in Whidbey
    <li>Outlining and auto case correction
    <li>New IntelliPrompt member list helper methods
    <li>Improved word wrap and bracket matching
    <li>New documentation and much, much more
As you can see, the new lexical parser now creates a stream of Token objects. These Token objects enable the run-time parsing information to be much more object oriented. Whereas in 1.0, you had Character objects with properties like LanguageIndex, StateIndex, etc., in 2.0 you now have Token objects that just point right to the appropriate SyntaxLanguage or LexicalState object.

Our ideas for 2.0 are to have a two-phase approach. The first is the lexical parsing phase which is already complete. We need your input on how to design a semantic parsing phase.

The goal of the semantic parsing phase is to allow you to iterate through the Token objects and use them to gather and assign semantic parse data. Semantic parsing means to take a series of tokens and assign them meaningful values. You could place this semantic parse data on the Tokens themselves, on document lines, etc. For instance, maybe a C# line with a "using" statement was just entered. You'd scan the "using" Token, identify it as a "using" keyword and load the name after it into some sort of code model. Then when you required IntelliPrompting later on, you would know that a namespace was referenced.

The trick with semantic parsing is that it has to work for several completely different styles of languages. Languages like C# can flow across document lines but multiple statements can also be placed on one line. Languages like XML somehow need to track what tags are in the document, what attributes the tags might have, etc. Languages like CSS are probably about the easiest to parse and provide IntelliPrompt for.

Then there is outlining... we have an implementation of it semi-working but aren't sure if we like where it is done. Should outlining be part of the lexical parsing phase or do you want more control over it, thus putting it in the semantic parsing phase. If you put it there then the semantic parsing phase much have two purposes: (1) scan tokens and gather semantic data (2) modify the outlining tree as needed.

Based on the above, please post your initial ideas without seeing the actual implementation of 2.0 yet.

Let's get some discussions going so we can get this version finished and out the door.


Actipro Software Support

Comments (3)

Posted 16 years ago by Marianne
Avatar
Semantic parsing is an extremely powerful tool that will make so many things much easier to do! A couple of questions, though. First, it's one thing to parse the semantics and put everything into a proper category (let's use your 'using' example) but when that line changes or a new 'using' line is entered, does the document need to be reparsed? Or is it updating our semantics by definition?

In other words, if I define a using directive as any word appearing after the using keyword, and I assume that will be done via regular expressions, then when a new using directive is added, does SyntaxEditor add that to my 'using' semantic array or do I need to reparse the line/document and programmatically add/remove those items?

Example:
Just one place where this will really shine is providing COM autocompletion. Take the following code (let's use vbscript):

dim fso, f, ts
set fso = Server.CreateObject("Scripting.FileSystemObject")
set f = fso.GetFile("C:\temp\myfile.txt")
set ts = f.

As of now, this requires that I walk the object stack twice. First, when they "dot the f", I need to find out what 'f' refers to, so I find the previous line, but then I need to find out what fso refers to. Once I have the object, I need to traverse down the same code lines to find out what object 'f' actually is, once this is complete I can show an autocomplete window with the relevant information.

This has to be done every time an object is dotted. Ideally, the semantic parser will allow me to assign values to these variables so that walking the object hierarchy does not need to be done for every item.

The big question is how are changed handled? If the fso line is changed, then the chain is broken. Although programmatically we'll have to handle that type of change, how can the semantic parser be implemented to make this as easy as possible to execute?

------------------------------- Marianne

Posted 16 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Right now (and this is fully subject to change based on your feedback) we allow you to define a semantic parser class and assign it to the root SyntaxLanguage. Then we call a PreParse method before any lexical parsing takes place and a PostParse method after all lexical parsing is complete.

You have the right idea with how we want things to work. We'd like you to be able to do one scan and assign meaningful data like reflection data to each Token after it's been created. Then when you show a member list, you can just look at the Token and decide what to do.

The problem with our current design is that although we provide a PreParse method, at that point you don't know what range of characters will be affected. You do know that in PostParse.

So with all that in mind, can you suggest a better way to implement this to get the results you need?


Actipro Software Support

Posted 16 years ago by Marianne
Avatar
The biggest issue I see is that the data/value attributed to each token is dependent upon the data/value of other tokens. In my example, modifying the 'set fso = ...' line would necessarily change the semantic-parsing information (or meta-data, if you prefer) for several other tokens.

One idea might be to use a linked list or better yet a tree structure to keep the relationships between tokens.

In any situation where objects are nested, which is all the time, changing an upstream object definition would necessarily alter or break the things below it. Does a tree structure make sense to maintain relationships between tokens?

As far as the PreParse method goes, in particular, I'm not sure what that buys you in the long run. Parsing would only seem to be useful after the lexical parsing is complete, but I certainly don't understand enough to say that categorically.

------------------------------- Marianne

The latest build of this product (v2020.1 build 0400) was released 1 month ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.