Context-dependent lexical parser

SyntaxEditor for Windows Forms Forum

Posted 6 years ago by M.Stange
Version: 12.1.0311
Avatar

(The following question applies to SyntaxEditor WinForms, but I would assume it applies to the WPF edition as well)

 

I need to provide editing support with syntax highlighting and intelliprompt for a small domain specific language. I started to implement classes based on MergableSyntaxLanguage, IMergableLexicalParser and RecursiveDescentSemanticParser - somewhat following the Simple language example.

This all works fine, except for one language "feature". Unfortunately, the language seems to require a context-dependent lexical parser, since a particular syntax can either be one feature or the other. Basically, the language is a little like Pascal/Delphi and supports the [...]-subscript for list/array access as in 

something[idx1,idx2] := value

Here, the tokens should be ident, square bracket, ident, ident, assignment, ident.

However, the square brackets can also be used to represent a certain type of value as in

something := [special-value]

Here, the tokens should be ident, assignment, special.

The actual parser executing the language has no issue in performing the lexical and semantic parsing. There is no actual ambiguity for any expression. However, the semantic and lexical parser collaborate in anaylzing the language, since the semantic parser provides a "hint" to the parser whether the next token would be a terminal/literal. If so, a square bracket would be used as the special-value, if not, a square bracket would be a simple token of its own.

How can this be achieved with the SyntaxEditor infrastructure?

Any help would be appreciated.

Comments (3)

Answer - Posted 6 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hello,

Yes this is the one area where our setup could have trouble with certain languages, since the lexer and parser are decoupled.  In scenarios like this, what we generally try to do is figure out by additional text scanning what the '[' might indicate. 

For instance if you know that the former case only works if an identifier is before the '[', then look for that to know it's a list/array access.  It seems like the latter case is more common in the grammar (expression-like) so that could probably be present in lots of scenarios.  Thus the former case is probably the one you should try and narrow down text searching criteria for, and base the decision on what '[' means off of that.


Actipro Software Support

Posted 6 years ago by M.Stange
Avatar

Thank you for the feedback. So since I cannot access the semantic context (since it is decoupled and constructed after the lexical parse), what is the proper approach to getting information about the previous token? 

I have implemented IMergableLexicalParser.GetNextTokenLexicalParseData to perform the lexical parsing and it returns a new LexicalStateAndIDTokenLexicalParseData instance with the correct token ID.

(btw: Lexial state is always 0 and I did not find much information about its proper use or intention)

If the next reader.Read() call returns my '[', then I would like to look at the previous token to determine which token to return. Is there a code snippet how to do that? Or would I need to approach this differently?

Posted 6 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hello,

You have it right, you would use the various reader methods to examine the text.  Note that it will only examine text, not tokens (since you are generating the tokens here), and you should make sure the reader's offset is back to where it was after you move it back.

For an example, here's a method we use in our C# programmatic lexical parser:

/// <summary>
/// Looks backward (skipping whitespace) to get the previous word.
/// </summary>
/// <param name="reader">An <see cref="ITextBufferReader"/> that is reading a text source.</param>
/// <returns>The previous word.</returns>
/// <remarks>
/// This method only looks at letters for sequential runs and will not scan words longer than 10 characters.
/// </remarks>
private string GetPreviousText(ITextBufferReader reader) {
	int offset = reader.Offset;
			
	char ch;
	string result = String.Empty;
	while ((!reader.IsAtStart) && (result.Length <= 10)) {
		ch = reader.ReadReverse();
		if ((Char.IsWhiteSpace(ch)) && (ch != '\n')) {
			// Break out of the loop if we already have a result
			if (result.Length > 0)
				break;
		}
		else if (Char.IsLetter(ch)) {
			// Append the letter
			result = ch + result;
		}
		else {
			// Set the result to the character if there is no result yet
			if (result.Length == 0)
				result = ch.ToString();
			break;
		}
	}

	// Return to the original offset
	while (reader.Offset < offset)
		reader.Read();

	return result;
}

Note how we make sure it returns to the original offset when it's done.  That makes sure that the lexer scan position remains in tact.  You could do something like above but modify the logic for your needs.


Actipro Software Support

The latest build of this product (v2018.1 build 0341) was released 7 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.