
I have implemented a programmatic mergable lexer and a token tagger for a EBNF grammar editor. I have chosen to use the ActiproSoftware.Text.Tagging.Implementation.MergableLexerBase and ActiproSoftware.Text.Tagging.Implementation.TokenTagger classes to get rid of some implementation stuff so that I only have to implement GetNextToken on the lexer and ClassifyToken on the token tagger. The problem is that I cannot control the call chain of the methods. Sometimes the tagger fires ClassifyToken before the lexer has ended the parsing of the text, causing (in my implementation based on sequential parse) some wrong classification types. I have noticed that when I paste somme text on the editor, the lexing occurs many times instead on only once as I would expect. The more surprising is that then I just have to click anywhere inside the editor to trigger a new parsing chain resulting in a correct highlightning result.
Let us see a example. Say I want to highlight this simple EBNF grammar fragment :
a = b;
b = c;
c = d;
say I want to have everything in bold. So if I paste the text I get this :
a = b;
b = c;
c = d;
what is not expected. if I click near the second line I get this :
a = b;
b = c;
c = d;
what was expected. Strange !
Also if I type the text in the editor it works as expected. But if I paste the whole text in the editor at once, the following chain occurs :
Lexer works from line 1 to Line 2 included; then token tagger starts to fire the tokens parsed on line 1; then lexer continue processing the line 3; then token tagger fires the token of line 2 but without the knowledge of the tokens on line 2 because it sends a different Id than the one the lexer provided. Then the lexer restart the processing of the text at the point the token tagger was (?????), then... many calls.
I have made the following assumptions for the implementation of the methods, and certainly some of them are wrong because I would not see then the strange behaviour I have.
=> 1) the GetNextToken of the lexer is invoked as soon as the text is modified in the editor
=> 2) the GetNextToken of the lexer is invoked as long as the reader.IsAtEnd is false
=> 3) the ClassifyToken is not invoked before the lexing process is ended
=> 4) it is possible to forward the reader.Offset programmatically inside the GetNextToken without affecting the process; for example if one assign the offset so that te reader get at end, the GetNextToken will stop to be invoked
=> 5) the MaxId and MindId of the TokenIdProviderBase should be for example 0 and 2 if you have only one Id of value 1 (this is taken from your samples)
EDIT : since I have posted I could see that one cannot rely on the MergableLexerBase to detect the very first time a GetNextToken is fired and the last time it's fired, neither it's possible to know when the TokenTagger will start to provide information. One cannot also rely on the fact that the lexer will always start from offset 0 and end at the end of the document : sometimes it goes backward. The implementation of MergableLexerBase is somewhat obscure regardind this. So how with the base implementations to kwnow :
=> when the lexer starts the very first time the parsing of a changed document ?
=> when the lexer/token tagger chain ends so that the highligthing occurs in the editor and the control is given back to the user, so that another change will restart the proccess ?
[Modified at 04/19/2010 07:07 AM]
Let us see a example. Say I want to highlight this simple EBNF grammar fragment :
a = b;
b = c;
c = d;
say I want to have everything in bold. So if I paste the text I get this :
a = b;
b = c;
c = d;
what is not expected. if I click near the second line I get this :
a = b;
b = c;
c = d;
what was expected. Strange !
Also if I type the text in the editor it works as expected. But if I paste the whole text in the editor at once, the following chain occurs :
Lexer works from line 1 to Line 2 included; then token tagger starts to fire the tokens parsed on line 1; then lexer continue processing the line 3; then token tagger fires the token of line 2 but without the knowledge of the tokens on line 2 because it sends a different Id than the one the lexer provided. Then the lexer restart the processing of the text at the point the token tagger was (?????), then... many calls.
I have made the following assumptions for the implementation of the methods, and certainly some of them are wrong because I would not see then the strange behaviour I have.
=> 1) the GetNextToken of the lexer is invoked as soon as the text is modified in the editor
=> 2) the GetNextToken of the lexer is invoked as long as the reader.IsAtEnd is false
=> 3) the ClassifyToken is not invoked before the lexing process is ended
=> 4) it is possible to forward the reader.Offset programmatically inside the GetNextToken without affecting the process; for example if one assign the offset so that te reader get at end, the GetNextToken will stop to be invoked
=> 5) the MaxId and MindId of the TokenIdProviderBase should be for example 0 and 2 if you have only one Id of value 1 (this is taken from your samples)
EDIT : since I have posted I could see that one cannot rely on the MergableLexerBase to detect the very first time a GetNextToken is fired and the last time it's fired, neither it's possible to know when the TokenTagger will start to provide information. One cannot also rely on the fact that the lexer will always start from offset 0 and end at the end of the document : sometimes it goes backward. The implementation of MergableLexerBase is somewhat obscure regardind this. So how with the base implementations to kwnow :
=> when the lexer starts the very first time the parsing of a changed document ?
=> when the lexer/token tagger chain ends so that the highligthing occurs in the editor and the control is given back to the user, so that another change will restart the proccess ?
[Modified at 04/19/2010 07:07 AM]