Semantic parsing: loading and deletions

SyntaxEditor for Windows Forms Forum

Posted 20 years ago by Marianne
Avatar
I'm able to use the semantic parser to apply SemanticParseData to tokens during any typing that the user does. However, when a script is loaded, it doesn't seem that each token is routed through the semantic parser on loading. Is this correct? And if so, do I need to iterate over every token myself to do the semantic parsing?

Second issue, when a block of text containing multiple tokens is deleted, and one of those tokens hold SemanticParseData, how can I get that data? All I can go is get the modification.DeletedText which is just a string that doesn't contain lexical/semantic information. If I'm not able to get the semantic data, then I don't know when to update my own internal information that is based on that semantic data.

Thanks.

------------------------------- Marianne

Comments (11)

Posted 20 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Marianne,

I believe the first issue is causing another issue (another post) where the CustomHighlightingStyle is not set until you type. I'm looking into this for the next release.

You can use the PreParse method to view that information. PreParse happens before any changes. Also if you have any detailed ideas for enhancements, please post them and I'll be glad to implement them if they are feasible.


Actipro Software Support

Posted 20 years ago by Marianne
Avatar
So using the PreParse method, would I look at the selection range of the text about to be deleted and then use the token stream to move through all tokens in the range of items about to be deleted?

Or would there be an easier way? Thanks.

------------------------------- Marianne

Posted 20 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
You could do that... The problem is that we don't know the actual range of tokens to be changed until after the lexical parser runs. The modified tokens could extend before and after the deleted text. If you have any ideas to improve this, feel free to post them.


Actipro Software Support

Posted 20 years ago by Marianne
Avatar
But isn't that only an issue when a file is first opened? Other than that, the lexical data is already applied. If a user makes a change then deletes a bunch of text, the lexical parser has already tokenized all the text that is about to be deleted, and as such, you should be able to determine if a token moves past the end of the deleted range.

------------------------------- Marianne

Posted 20 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
No, you have to think about the case where you have something like this in C#:
//*

Then delete the middle /. That is only changing one character however it now becomes a multiline comment so its possible that the entire document after that point will need to be reparsed.

Make sense?


Actipro Software Support

Posted 20 years ago by Marianne
Avatar
Right, in your example, the state change affected everything else. So PostParse is really where we want to grab the data.

How about this? Instead of just providing string values for deleted text, would it be possible to persist the lexical and semantic data along with the text? So that one could iterate through the tokens of the recently deleted (or even added) text? The only real question here is how to handle the partial deletion of a token like in your example.

Even though just one backslash is being removed, you would not want to do another lexical parse of the newly deleted data.

OK, final thought, persist the lexical/semantic data after deletion so that lexical parsing can take place on the rest of the text but we can still dig through the deleted text and really know what was deleted. Second, include a boolean in with these tokens to determine if the token in question is a "partial token". If so, then we know to go back and look at the rest of the token if need be. Or maybe, just include another string value for the entire token. In other words, in your example if the second slash was removed, then the deleted token key is "CommentStartToken" but the 'FullTokenText' value or some such is "//".

Does any of this make sense? Thanks.

------------------------------- Marianne

Posted 20 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Marianne,

After looking into things for another issue (the Token.CustomHighlightingStyle one), it seems like the semantic parser is running ok. This is in regards to the first paragraph of your original post. You can use the TokenStream class to iterate through all the Tokens in the document. One thing that we maybe should do is clear the semantic parse data out of Tokens when the lexical parse data changes for them. Is that your problem for the original post and what do you think about that any any side effects it might cause?

I think the best overall solution for parsing is to preserve a copy of all Tokens that are modified during lexical parsing. Then provide it to SemanticParser.PostParse. That way you can iterate through them and update semantic data. This could be an expensive operation so maybe it should be made optional via a flag on the editor.


Actipro Software Support

Posted 20 years ago by Marianne
Avatar
Clearing the semantic parse data isn't even important, and in fact not desirable, providing that this new token is sent back to the semantic parser to be (re)processed. In other words, if a token is "cut in half" by a delete operation:

-The remaining piece of the token should be reprocessed by the lexical parser
-The semantic data should not be changed, but the token should be sent back to the semantic parser for processing, if necessary
-The deleted token section should be included in the token list that is created by the lexical parser and passed as an event arg to PostParse (per your suggestion). Again, the semantic data should not be modified.

And including this as a flag is a good idea, as many will not need to use such a function.

Does it sound like we're on the same page?

Thanks.

------------------------------- Marianne

Posted 20 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Yes I think we are on the same page.


Actipro Software Support

Posted 20 years ago by Marianne
Avatar
Actually, having spent the better part of a day thinking about this, I've changed my mind.

The modified token SHOULD have it's semantic parse data removed and then sent back through the semantic parser.

The deleted tokens (or pieces of the deleted tokens) should NOT have the semantic parse data removed.

This is much better because when a removed token is handed to PostParse, you know that that token's semantic data doesn't exist in the document any longer and can be dealt with accordingly in code. If you keep both semantic parse data copies, it would be tougher to know where you are.

Basically, I think this is what you said initially. The more I think about it the more sense it makes to me. Is it possible to implement it this way?

------------------------------- Marianne

Posted 20 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
We'll have a better picture once we try to implement this feature. It's hard to say until then.


Actipro Software Support

The latest build of this product (v24.1.0) was released 2 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.