Semantic parsing after a change

SyntaxEditor for Windows Forms Forum

Posted 18 years ago by Jared Phelps
I was wondering if you guys could point me to some resources on this subject. As I've mentioned before, I have custom written a semantic parser and lexer, and have integrated them into the syntaxeditor object model (the lexer creates ITokens and the semantic parser creates IAstNode trees).

Where I'm getting stuck is figuring out the most optimized way to recreate my Ast tree after a change to the document. Do you guys typically run a full semantic parse every time, do you just run it for everything after the change, or do you have some fancy algorithms in place to only parse the stuff that might be affected, depending on the change the user made? I was trying for the 3rd option and got it working for most cases, but things like adding/removing quotation marks, brackets, and other lexically small but semantically huge changes got cumbersome. It seems like a huge waste of resources to semantically parse the whole document when they might have just added some whitespace or changed the name of an identifier somewhere. If I were to generate my semantic parser using your parser generator, how would it behave?

If it makes a difference, For me, an "average" semantic parse takes between 100-500 milliseconds. A longer one may be 2 seconds-ish. Not really noticable since I'm using the semantic parser service, but still feels like a waste.

I know this isn't exactly a syntax editor issue, but I figured if anybody knows, you would.


Comments (2)

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA
Hi Jared,

We are doing the full document parse. As you said it's a waste in most situations however due to the fact that the end user can simply type /* to start a comment or } to close out a scope, it's almost impossible to handle every instance of minor changes that occur in an incremental fashion. Our parser generator will by default make parsers that parse the entire document. However you can easily extend them to do fragments too.

With the semantic parser service, the work is done in a worker thread so the end user doesn't notice the work happening behind the scenes if it does take longer than a second. Also, if there is already a queued request in the service and you make another modification to the document, it will cancel the original and queue up the new one. Although it will not interrupt the currently executing parse operation, if any.

Where this is handy is that say you type 123 with brief pauses in between the characters in a fairly large document. The 1 will kick off a parse. Assuming it is still parsing when 2 is pressed, it will queue up a new parse to start when the first ends. Assuming the first parse is still running when 3 is pressed, the second parse request will be removed and a new parse request will be made, which should combine the lexical parse range indicated in both request.

So as you can see there was a lot of optimization work built in to support sequential lengthy parse operations if needed.

Actipro Software Support

Posted 18 years ago by Jared Phelps
OK thanks for the info...That makes me feel a lot better about doing a full semantic parse even when it's not strictly necessary. After a few hours of work, I reached a compromise between what I was trying to do and doing a reparse every time. I detected if they changed at most one token, and the token changed was an identifier, number/string literal, whitespace, or comment. If so, I just update the appropriate value on my AST tree and adjust offsets accordingly. I'm still doing a parse on tokens that can do non-trivial changes. The deleted tokens property was pretty critical, and I couldn't figure out why it was always empty until I discovered the DeletedTokensTrackingEnabled property. Once I set that to true I was in business.

I'm going to have to refactor a bit when it comes time to inherit from MergableLanguage instead of SyntaxLanguage, but for my first go-live SyntaxLanguage will do me fine.


[Modified at 10/30/2006 11:48 AM]
The latest build of this product (v24.1.0) was released 4 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.