When implementing ILexer, OnPreParse always sets offset to zero - SyntaxEditor for WPF Forum

Posted 12 years ago by Robin Neatherway

Version: 13.1.0581

I am working on a non-mergeable lexer by implementing the ILexer interface.

I've run into a few problems, the first of which is that when I call OnPreParse, the offset I pass in is always reset to zero.

member this.Parse (s:TextSnapshotRange, lt:ILexerTarget) : TextRange =
    
    // Let's try starting from the beginning of the line
    let mutable startOffset = s.StartLine.StartOffset
    let lexCxt = lt.OnPreParse(&startOffset)

    let lexState = 
        if lt.HasInitialContext && lexCxt.ScopeState.LexicalState.Tag <> null then
            lexCxt.ScopeState.LexicalState.Tag :?> LexState
        else
            0L

In the code fragment above, the argument 's' (a TextSnapshotRange) always seems to refer to a single line of the file. Thus when I call OnPreParse suggesting that I start from the start of the first line, this is just agreeing with the requested start position. However, the value of startOffset after OnPreParse is always zero. How does the ILexerTarget determine when to actually start? Does it depend on whether it has lexical state information available at that position?

Always starting from the beginning of the file is obviously bad for performance, so I would like to avoid this. I have implemented an ILexicalScopeStateNode, which I create a new instance of for each parsed token, store the current lexer state on its Tag property and pass the result into OnTokenParsed along with the actual token (which inherits from TokenBase). In this way I believe I am storing the necessary lexical state data to resume incremental parsing later, which is why I am surprised that OnPreParse always sets the startOffset to zero.

The colouration is almost entirely correct, so it seems that the lexing is working correctly. However, to debug it, as well as the above issue, I would like to be able to see exactly what portions of the file have ended up being tagged with which token ids and what lexical states have been stored at which points. What is the best way to go about doing this? I tried using GetNextToken at the beginning of the Parse method, but I think this triggered another Parse. I'd just like to display the current token and lexical state tagging.

Comments (4)

Answer - Posted 12 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Robin,

You are correct that the TextSnapshotRange generally is a single line of the file. And OnPreParse should be passed the start offset of that range. In our implementation of ILexerTarget.OnPreParse, we look at the offset to see what the closest offset (at or before) is that has contextual info for resuming incremental lexing. Then we return that context offset. This way, the lexer picks up at the last spot it knows how to start incremental lexing from. So in our implementation, the startOffset that is returned will either be the same or smaller than what was passed in.

It definitely should not always be starting at zero. Perhaps there is some issue with storing the context data in your setup. If you'd like us to help debug it, please make as small/simple as a project as you can that repro's the issue then e-mail that to our support address. Reference this post and rename the .zip file extension so it doesn't get spam blocked.

Actipro Software Support

Posted 12 years ago by Robin Neatherway

Thanks for the reply. Focussing on the contextual data storage as you suggested, I was able to find a bug in the offsets being passed to OnTokenParsed. The startOffset now looks as I would expect.

I have another question about how the return value of OnTokenParsed is determined. This boolean tells us whether to continue lexing even after we have passed the end of the current snapshot. In my lexer I am providing highlighting of preprocessor blocks (similar to C#) like:

#if SYMBOL

some code

#else

some more code

#endif

According to whether SYMBOL is defined, one of the blocks will be inactive, and should be coloured grey. Of course, when the line containing SYMBOL is modified, this status is likely to flip between the two blocks at least once. In this case the lexer should continue right down to the end of the #endif. However, OnTokenParsed will return false (i.e. stop lexing) before that point. How should I approach this problem. I can see two options:

1. Continue lexing, ignoring the return value of OnTokenParsed, until I see the matching #endif token.

2. Somehow dirty the token status down to the matching #endif such that the tokens are re-lexed in the natural order of things.

Posted 12 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Robin,

If that method returns true, our lexical context provider thinks that the context is now in sync with the document contents. But as you mentioned, in your case, it might not be (later on at least) due to the toggling of things. I believe you'd do the #1 and continue on until you determine the first opportunity to quit after OnTokenParsed is true.

Actipro Software Support

Posted 12 years ago by Robin Neatherway

Thanks, everything seems to be working well now.

The latest build of this product (v25.1.0) was released 2 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.