Implementing ITokenReader for non-mergable lexer

SyntaxEditor for WPF Forum

Posted 12 years ago by Nick Beer - National Instruments
Version: 11.2.0550
Avatar

Hello -


I've created a programmatic, non-mergable lexer (implemented ILexer), and I'm now trying to add parsing capabilities with the LLParser library.  What's unclear to me is how to correctly implement the ITokenReader interface LLParserBase requires.


To begin, I've created a class that inherits TokenReaderBase, but implementing the GetNextToken method is the hangup.  Because I'm only given a TextBufferReader, it's not clear how I can leverage the code I've written to implement ILexer (I have no lexical states, etc) to correctly get the next token.


Do you have any examples of creating an ITokenReader with a non-mergable lexer, or any pointers?


Thanks -


Nick

Comments (7)

Answer - Posted 12 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Nick,

Sorry but we don't have an example of a non-mergable lexer driving a TokenReaderBase-based class.

In our MergableTokenReader class, we pass the ITextBufferReader and a lexer instance (you can create a new one for the token reader purposes) into the constructor.  The lexer gets initialized based on the offset from the reader.

Then in a token reader GetNextToken override, we return the lexer's next token based on where the reader currently is.

One important thing to do though is to also override the token reader's Push/Pop methods.  Basically as the parser is running, it may go down a certain path to look and see if a certain token sequence can match.  But if that fails (or if it was just a peek ahead), it's going to need to pop the lexer state back to its starting point, so it can continue on from back there again.

We do this by maintaining a stack of states.  When Push gets called, we flag that we are in a look-ahead scenario and start maintaining a list of tokens we've encountered that are past our "real" base offset.  We continue adding to that cached list of found tokens as we progress more into the look-ahead.  Then once we Pop back to the real base offset again, as we move forward, if there are any look-ahead tokens at the start of the stack we pop those and use them directly instead of re-lexing the same text again.  We found that to be the most efficient way to implement things.

But you could probably alternatively maintain a stack of lexer states and just push/pop those on the appropriate calls instead.

Hope that helps.


Actipro Software Support

Posted 12 years ago by Nick Beer - National Instruments
Avatar

Hello -

Thanks for the information.  Here's what I ended up doing - I would be interested in any comments you have.

Unfortunately, the situation I'm in makes it impossible to create a new instance of the lexer from the parser.  What I've done instead is to override the Parse method to obtain a reference to the snapshot to be parsed.  I then use the ITextSnapshotReader associated with the snapshot to implement GetNextToken in the ITextBufferReader. 

One thing that I've noticed is that when the lexer associated with the snapshot returns the document end token, the ReadToken method on the snapshot reader returns null.  I handle this by creating another document end token and returning it, but it makes me feel wary of my method.

Do you have any thoughts?

- Nick

Posted 12 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

It's hard to say without debugging it.  If you'd like us to look into that, please email our support address a new very simple sample project that shows what you are seeing and we can debug it to see what's going on and if there might be a problem.  Make sure you rename the .zip file extension so it doesn't get spam blocked.


Actipro Software Support

Posted 12 years ago by Nick Beer - National Instruments
Avatar

As an update, one problem with using the SnapshotReader approach is that the snapshot reader cannot parse the header and footer text associated with a snapshot.  Is there any chance this might be added sometime in the future?

Answer - Posted 12 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Nick,

The ITextSnapshot.GetMergedBufferReader method will give you an ITextBufferReader that includes the header and footer.


Actipro Software Support

Posted 12 years ago by Nick Beer - National Instruments
Avatar

Sorry - I should have given my message a little more context.

I'm trying to implement ITokenReader for my implementation of LLParseBase.  I have a non-mergable lexer, and I am not able to create a new instance of it from my parser for reasons I can explain, but I don't think are pertinent to this discussion.  Because I'm not able to create a new instance of my lexer, I'm having a hard time implementing the GetNextToken method of ITokenReader.

To solve this problem, I overrode Parse in my implementation of LLParseBase.  This allowed my access to the snapshot that is being parsed.  From the snapshot, I was able to obtain a ITextSnapshotReader, which I passed to my constructor that is implementing TokenReaderBase.  The ITextSnapshotReader's underlying ITextBufferReader was passed to my base (TokenReaderBase) constructor.  I then used the ISnapshotReader to implement GetNextToken, by calling ISnapshotReader.ReadToken().

As can be seen, ITextBufferReader is not much help to me if I don't want to re-implement my lexer - or at least the lexical state management portion of it.  Because ITextSnapshotReader cannot access the header and footer text, it's not a great solution either.

Fortunately, the thing I'm working on now is only a small example and not required as part of our larger project.  However, if it were required, I would be unsure how to best proceed at this point.

- Nick

Posted 12 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Nick,

What I described in my first reply is still accurate.  And I should also note that the ITextBuffer reader passed to the ILLParser.CreateTokenReader method is the one returned from ITextSnapshot.GetMergedBufferReader, so it will contain the full text of header + document + footer.

When you are making your IParser and are overriding your CreateTokenReader method, you want to use the reader we pass with your lexer.  In your case you said you can't create a new lexer instance, so perhaps when you create your parser class you could pass the lexer instance into it and store it as a field.  Then in CreateTokenReader, pass the ITextBufferReader parameter we give and your lexer instance to your custom token reader.  Have your lexer operate on that reader.

You said the problem is with implementing the token reader's GetNextToken method.  So assuming your token reader had the ITextBufferReader (our instance we pass to CreateTokenReader with merged header/footer) and your lexer instance, any time the GetNextToken method was called, you'd need to somehow tell your lexer to switch to that ITextBufferReader and read the next token from it.  And you'd have to maintain its state, etc. there as described in the originaly reply.


Actipro Software Support

The latest build of this product (v24.1.1) was released 1 month ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.