Programmatic lexer calls with performance issues

SyntaxEditor for WPF Forum

Posted 13 years ago by Tino Schnerwitzki
Version: 11.1.0543

I´ve created a programmatic lexer by implementing the ILexer interface and been running into some serious performance issues. At the moment the implementation always analyzes the entire text and uses the OnTokenParsed-Method of the ILexerTarget to register the tokens.

The problem now is that the Parse-Method of the Lexer is often called hundreds of times. For example if I paste a text with 160 lines in an empty editor, the lexer is called more than 300 times. At the beginning it starts by lexing single lines (lines 1 through 20 - 20 calls). Then suddenly it jumps to lines 159 and 160, only to start right after this at lines 1 through 18 again. There are calls of the lexer that want to lex more than one line as well. After pasting the 160 lines the edior is unuseable. Typing a single character takes like half a second.

The same thing happens as soon as I scroll the text as well. One single scroll with the mouse wheel results in more than 130 calls of the lexer (this time only single lines but each line more than once).

Why is the lexer called so often and how can this be avoided?
Are there any things of special note that need to be considered when implementing a custom lexer?

Thanks in advance.


Comments (1)

Posted 13 years ago by Actipro Software Support - Cleveland, OH, USA
Hi Tino,

Unfortunately we don't have a sample yet for building a non-mergable programmatic lexer. I'll mark down this thread with the TODO item for that so we can let you know when we add one in the future.

Lexers need to be incremental since as you saw, they will get called for each line of text as the line layout logic goes to determine which classification types are present on it. Also any parser you build for your language will indirectly call the lexer too, as it runs through the tokens.

The ILexer.Parse method is defined as:
TextRange Parse(TextSnapshotRange snapshotRange, ILexerTarget parseTarget);

The snapshotRange parameter indicates the offset range that needs to be lexed. So if its StartOffset is 1000, you should be starting around offset 1000 or before, yet close to it. You should not be starting a offset 0.

If you run into trouble doing a non-mergable programmatic lexer, you might want to revert back to doing a mergable programmatic one since those are easier to implement. The Getting Started 3c QuickStart shows exactly how to create one of those. They will be slightly slower than the non-mergable variant (assuming you implement it to be incremental properly), but will still be plenty fast for the end user. Our C# and VB languages in the .NET Languages Add-on do it this way and have very good performance.

Actipro Software Support

The latest build of this product (v24.1.2) was released 2 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.