Returning tokens with non-mergable lexer

Comments (4)

Posted 16 years ago by Martin - Blaise - Statistics Netherlands

I am playing with the same.. I think you add a TokenTaggerService/provider to the constructor of the SyntaxLanguage class. This class is based on TaggerProviderBase<> which sh provide the IDs/tags i guess.

Maybe TokenTagger.GetTokens is used?

[Modified at 01/05/2010 06:38 AM]

Posted 16 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Bernie,

Make sure you read through the "Programmatic Lexers" topic in the documentation. It has a "Implementing a Non-Mergable Programmatic Lexer" section that walks through everything you need to do to make a non-mergable programmatic lexer. You'd need to register the lexer as a service with your language.

Then as Martin indicated, you also need to register a token tagger provider service with your language. Look at the "Taggers and Tagger Providers" topic under the "Token Taggers" section and it has some notes on what you need to do for languages with non-mergable lexers, when building a token tagger.

To sum up, the lexer is what can parse text and create tokens. The token tagger is something that can cache lexer states and allow for incremental lexing, and it also is what other language services and/or editor features can use to get access to tokens.

For your purposes if you wish to read through tokens for some reason, you'd use an ITextSnapshotReader though (via document.CurrentSnapshot.GetReader(offset)) because that provides a handy class for navigating through a text snapshot by character, token, and other ways.

Actipro Software Support

Posted 16 years ago by Bernie Schoch

Ok thanks (still trying to get my head around this all), I think now understand the lexer and that tokens are sent to ILexerTarget.OnTokenParsed and it can be any IToken derived class.

Now I have questions on the Parser. From what I can see it 'reads' the tokens using
ITextSnapshotReader reader = request.Snapshot.GetReader(0);

Also I think I understand the concept of dispatchers and threads. But now my question is how do the two parts connect? For example, I assume that the lexer gets called all the time either for segments of text or all in order to highlight/mark the source. Then periodically, the parser gets called (on a thread if you have set that up). Has the lexer been called with all the source at the time the parser gets called? or do I have to call the lexer first before parsing.

What I'm missing is the timing and the sequence of when each part gets called. Perhaps a hypothetical sequence walkthrough would be helpful. e.g. sequence to intialize (pretty well documented). Now 2 different paths: 1 source file is preloaded, then what happens (in terms of lexer/parser), and 2 we start with a blank source file and the user starts typing in and again what happens.

Also a generalized strategy of how to take an existing lexer and parser and integrate it into the syntax editor (which is what I'm trying to do).

Thanks,
Bernie

Posted 16 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Bernie,

Actually an IParser can do anything it wants. The only requirements of a parser are that it returns some sort of parse data that gets set to the document's ParseData property. Sometimes parsers do want to reuse the ILexer set up for the language. In these scenarios, yes you can use an ITextSnapshotReader. Alternatively the request passes you an ITextBufferReader instance in its TextBufferReader property. That is a low-level text scanner that can be used if you have a custom lexer that should be used to feed tokens to your parser instead.

The ILexer and related token tagger provider services registered with a language are really only used to drive classification (and viat that, syntax highlighting), along with feeding tokens to ITextSnapshotReader instances. Token taggers store data to enable incremental lexing, meaning being able to pick up lexing at a certain point in the document instead of always at the start. By using that sort of feature along with various virtualization techniques we use, our text renders super-fast in editor views and can open huge documents like 10MB about instantly.

As mentioned above IParser instances can choose to use an ITextSnapshotReader, which indirectly have them using the ILexer and token tagger. Or other times they may create their own lexer that was generated using some third-party parser generator instead. That is why we supply the request.TextBufferReader, so that a low level text stream can be passed to third-party lexers as needed.

As far as sequencing, when text changes occur, views are notified of the change. They will request that dirty regions have their tokens rescanned with the ILexer so that classifications and syntax highlighting can be updated. This occurs right in the main UI thread and goes fast since we only lex through what is visible in the view.

If you are using a parse request dispatcher (to enabled worker threads for parsing), the parser is called in a worker thread following a short delay after any text change occurs. The text change could be document load, typing, or anything else. The parser will always be called after a delay and in the scenario where many text changes are occuring quickly in a row (like while typing), it will delay subsequent parser calls until a brief idle period has been reached. When the parser completes, it returns its result back to the document.ParseData property and the ParseDataChanged event fires.

You shouldn't need to worry about having to call one phase before another. Just let the text/parsing framework do its thing and you should get the right results regardless of how things get called.

Hope that helps.

Actipro Software Support

The latest build of this product (v25.1.1) was released 24 days ago, which was after the last post in this thread.

Comments (4)

Add Comment