Issues getting started with a simpe language using programmatic lexer/parser

SyntaxEditor for WPF Forum

Posted 10 years ago by M.Stange
Version: 13.2.0592
Avatar

(I am trying to transition from SyntaxEditor/WinForms to WPF and am confused by many things on the syntax language implementation looking so different...)

I have tried to setup a simple project to get started with a programmatical lexer. However, it does not seem to be called and I do not see where the issue is.

I have a main window with the SyntaxEditor control, which initializes as follows:

public MainWindow()
{
    InitializeComponent();
    ActiproSoftware.Text.Parsing.AmbientParseRequestDispatcherProvider.Dispatcher =
new ActiproSoftware.Text.Parsing.Implementation.ThreadedParseRequestDispatcher();
    SyntaxEditor.Document = new EditorDocument() { Language = new ScriptSyntaxLanguage() };
}

 The syntax language I am trying to implement has just a lexer for now:

public class ScriptSyntaxLanguage : SyntaxLanguage
{
    public ScriptSyntaxLanguage()
        : base("script")
    {
        this.RegisterLexer(new ScriptSyntaxLexer());
    }
}

 The lexer is just a skelleton for now with no actual logic yet:

public class ScriptSyntaxLexer : MergableLexerBase
{
	private LexicalStateCollection lexicalStates;

	public ScriptSyntaxLexer()
	{
		// Create ID providers
		this.LexicalStateIdProviderCore = new ScriptLexicalStateId();
		this.TokenIdProviderCore = new ScriptTokenId();

		// Create the default lexical state
		ProgrammaticLexicalState lexicalState = new ProgrammaticLexicalState(ScriptLexicalStateId.Default, "Default");
		lexicalStates = new LexicalStateCollection(this);
		lexicalStates.Add(lexicalState);
		this.DefaultLexicalStateCore = lexicalState;
		
	}

	public override IEnumerable<ILexicalStateTransition> GetAllLexicalStateTransitions()
	{
		return lexicalStates.GetAllLexicalStateTransitions();
	}

	public override MergableLexerResult GetNextToken(ITextBufferReader reader, ILexicalState lexicalState)
	{
		throw new System.NotImplementedException();
	}
}

 The actual GetNextToken method is not yet implemented. 

However, running the application indicates that the method is never even called. What am I doing wrong for a very simple start into the languge?

Comments (3)

Answer - Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hello,

You are missing the token tagger provider language service.  Basically that is a language service that other things can call to get tokens (via the lexer) and do syntax highlighting.  I would highly recommend that you walk through our Getting Started series of QuickStarts since they show you step-by-step how to build a language (and some options of things along the way).  Getting Started 3b is a basic programmatic lexer so that's a perfect example of what you want to do.


Actipro Software Support

Posted 10 years ago by M.Stange
Avatar

Thank you - that hint was helpful. I was actually trying to follow the QuickStart series, but have issues sorting out the important aspects in the examples.

I now have a custom lexer with syntax highlighting working in my project. 

The next hurdle appears to be the semantic parser. In WinForms, I have implemented a RecursiveDescentSemanticParser providing a Parse() method that returns a CompilationUnit implementing ICompilationUnit and ISemanticParseData. 

Following GettingStarted02, I would need to implement a class derived from ParserBase and provide Parse(IParseRequest request), returning IParseData. My IParseData implementation implements IParseData, IParseErrorProvider and provides Errors and the Ast. 

public class MyParseData : IParseData, IParseErrorProvider
{
	public IAstNode Ast { get; set; }
	public IEnumerable<IParseError> Errors { get; private set; }
	public ITextSnapshot Snapshot { get; private set; }

	public MyParseData(ITextSnapshot snapshot)
	{
		this.Snapshot = snapshot;
		this.Errors = new List<IParseError>();
	}
}

 Now the parser looks like this:

public class MyParser : ParserBase
{
	public MyParser() : base("my")
	{
	}

	public override IParseData Parse(IParseRequest request)
	{
		var parseData = new MyParseData(request.Snapshot);

		// TODO

		return parseData;
	}
}

 The TODO part is where the parsing should occur. 

In WinForms, the semantic parser derived from RecursiveDescentSemanticParser used to have access to 

  • the LookAheadToken (including its Text and StartOffset)
  • the (current) Token (including its Text and StartOffset)

Where do I find that in the WPF implementation? Have seen GettingStarted02 doing a ITextSnapshotReader reader = request.Snapshot.GetReader(0), but ITextSnapshotReader only has ReadToken().

Also, WinForms was using a class derived from MergableRecursiveDescentLexicalParser to filter out ignored token (whitespace, comments, ...). Where do I find that in the WPF implementation?

 

Overall, the transition from WinForms to WPF for syntax parsing appears to be non-trivial. The examples mainly revolve around building a script language with the Language Designer tool and the LL Parser framework. The information on building that 'directly' is pretty scarce. 

Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hello, the ITextSnapshotReader has a ton of method for navigating the snapshot.  The Token property is effectively the one you are currently at (via the current offset).  The TokenText gives you its text.  You can use the ReadToken method to advance a single token or ReadTokenReverse to go the other way.  If you do reader.BufferReader.GetSubstring, you can get any segment of text from the snapshot.

All the MergableRecursiveDescentLexicalParser was in WinForms was a buffer layer between the lexer and parser.  We have a similar concept (called token reader) in our LL(*) Parser Framework.  If you are hand-rolling a parser, you could make something similar too.  Just make a class that looks at the snapshot reader and returns the filtered tokens to your IParser.

Another option when implementing this is instead of using the ITextSnapshotReader, work directly with a mergable lexer via something like this:

var coordinator = MergableLexerCoordinator.Create(reader, rootLexer);

If you pass it the core ITextBufferReader and an instance of the IMergableLexer class used by your language, it will give you a way to read the next token in a forward-only fashion.  It also supports look-aheads via Push/Pop methods.  And via the buffer reader, you can get any text substring.


Actipro Software Support

The latest build of this product (v24.1.2) was released 2 days ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.