
I am working on a non-mergeable lexer by implementing the ILexer interface.
I've run into a few problems, the first of which is that when I call OnPreParse, the offset I pass in is always reset to zero.
member this.Parse (s:TextSnapshotRange, lt:ILexerTarget) : TextRange =
// Let's try starting from the beginning of the line
let mutable startOffset = s.StartLine.StartOffset
let lexCxt = lt.OnPreParse(&startOffset)
let lexState =
if lt.HasInitialContext && lexCxt.ScopeState.LexicalState.Tag <> null then
lexCxt.ScopeState.LexicalState.Tag :?> LexState
else
0L
In the code fragment above, the argument 's' (a TextSnapshotRange) always seems to refer to a single line of the file. Thus when I call OnPreParse suggesting that I start from the start of the first line, this is just agreeing with the requested start position. However, the value of startOffset after OnPreParse is always zero. How does the ILexerTarget determine when to actually start? Does it depend on whether it has lexical state information available at that position?
Always starting from the beginning of the file is obviously bad for performance, so I would like to avoid this. I have implemented an ILexicalScopeStateNode, which I create a new instance of for each parsed token, store the current lexer state on its Tag property and pass the result into OnTokenParsed along with the actual token (which inherits from TokenBase). In this way I believe I am storing the necessary lexical state data to resume incremental parsing later, which is why I am surprised that OnPreParse always sets the startOffset to zero.
The colouration is almost entirely correct, so it seems that the lexing is working correctly. However, to debug it, as well as the above issue, I would like to be able to see exactly what portions of the file have ended up being tagged with which token ids and what lexical states have been stored at which points. What is the best way to go about doing this? I tried using GetNextToken at the beginning of the Parse method, but I think this triggered another Parse. I'd just like to display the current token and lexical state tagging.