Custom Language

Comments (11)

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Craig,

Yes I believe that is correct. If you are using the parser generator, you would still need to inherit RecursiveDescentSemanticParser as in the Simple sample. If you are making your own custom semantic parser withour using our framework then you can kick it off in PerformSematicParse of your language.

The lexical parser you mentioned is a IRecursiveDescentLexicalParser, so doing something similar to the Simple language works where you make a class that inherits RecursiveDescentLexicalParser. This acts as the bridge between the regular lexical parser and the semantic parsers, allowing you to filter out meaningless tokens like whitespace and comments as needed.

Actipro Software Support

Posted 18 years ago by Craig Neblett - Elysian Software

So, in order to take full advantage of the available base classes, without going through the generator, I'd have classes that inherited the following:

SyntaxLanguage
RecursiveDescentLexicalParser
RecursiveDescentSemanticParser

And implement code similar to the simple language sample, even though my language will not be mergable?

What I'm getting at here is what is the minimum amount of code necessary to get up and running with a non-dynamic, non-mergable custom language?

[Modified at 04/23/2007 07:36 PM]

Craig Neblett craig@codersglow.com

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

To do the minimum, you would not need the recursive descent classes at all. Those are used with the semantic parser generated by the parser generator.

The minimum would be a SyntaxLanguage, which had some programmatic lexical parsing ability implemented in PerformLexicalParse, and probably a custom IToken implementation.

Actipro Software Support

Posted 18 years ago by Werdna

I also need to implement custom language.
I started with Sample provided, but that sample uses Mergable interfaces from the generated parser from grammar.xml file.
I have my own custom parser and want to use IToken interface to generate the stream of tokens (easy part), but I'm just not sure how to plug the parser into this.
I want to offer intelisense, parameter info, etc.
Would it be easiest to inherit from MergableRecursiveDescentLexicalParser, or do it some other way?
I do want to use AstNodeBase for basis of my AST.
Thanks.

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

Can you provide more information on how your parser works and what it does/returns?

Actipro Software Support

Posted 18 years ago by Werdna

My parser is similar to VB, but only one language at a time. I do need to parse multiple files to determine what types/functions are available.

I've inherited from AstNode and created lexer from IMergableLexicalParser. I don't probably need to inherit from mergeble lexer, but I'm not sure how to go about doing this.

I wish there was a simple example available that does not use Mergeble language.

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

Thanks for the suggestion. This has been requested by some other customers lately so we'll try and bump it up on the TODO list. Perhaps to create a "template" non-mergable language that doesn't work yet but is stubbed out with comments for what you need to do.

In the meantime, there is some documentation on this sort of thing. Like in the Lexical Parsing topic, look under "A PerformLexicalParse Method Implementation Example".

Under custom token classes, we need to add this to the documentation but there is a NonMergableTokenBase that is the ideal base class for your non-mergable language tokens.

Other than the lexical parsing and token areas, non-mergable languages share most of the same features as the rest of the language types. Of course your non-mergable language should inherit SyntaxLanguage directly too.

I hope this helps get you started.

Actipro Software Support

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

I'd say the most important thing for implementing the lexer is knowing that you need to override ParseDataEquals in the NonMergableTokenBase. I would have never figured that out without the help of the support forum.

Other than that, implementing the lexer was pretty straightforward.

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by Werdna

Did you mean TokenBase? I couldn't find any references NonMergableTokenBase.

I started off with mergable and I'm trying to convert it to non-mergable language.
I used example PerformLexicalParse as starting point.
When I paste into document:
FUNCTION test()
it highlights correctly, but when I type FUNCTION i recieve PerformLexicalParse after each character so FUNCTION becomes 8 tokens of type Identifer as start and end offsets of parseTextRange are always for one charcter.
Is it up to me now to know that I need to backtrack?

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Yes. YOu want to backtrack to at least the previously valid token (prior to the one that contains the current offset), otherwise, you get f u n c t i o n as different identfiers as you are typing.

I think this is done in the example, if not, it's done in this code:

        public override TextRange PerformLexicalParse(Document document, TextRange parseTextRange, ILexicalParseTarget parseTarget)
        {
            // Update the parse text range
            int startDocumentLineIndex;
            int endDocumentLineIndex = document.Lines.IndexOf(parseTextRange.EndOffset);
            if (document.Lines[endDocumentLineIndex].Contains(parseTextRange.StartOffset))
                startDocumentLineIndex = Math.Max(0, endDocumentLineIndex - this.LexicalParserLineLookBehind);
            else
                startDocumentLineIndex = Math.Max(0, document.Lines.IndexOf(parseTextRange.StartOffset) - this.LexicalParserLineLookBehind);
            int parseStartOffset = document.Lines[startDocumentLineIndex].StartOffset;
            int parseThroughOffset = document.Lines[Math.Min(document.Lines.Count - 1, endDocumentLineIndex + this.LexicalParserLineLookAhead)].EndOffset;

            // Move back to the start of the token 
            int lexicalStateID = 0;
            if (parseStartOffset > 0)
            {
                IToken initialToken = document.Tokens.GetTokenAtOffset(parseStartOffset);
                lexicalStateID = initialToken.LexicalStateID;
                if (initialToken.StartOffset < parseStartOffset)
                    parseStartOffset = initialToken.StartOffset;
            }

            // Initialize the modified start/end offsets
            int modifiedStartOffset = parseTextRange.StartOffset;
            int modifiedEndOffset = parseTextRange.EndOffset;

            // NOTE: LineIndex parameter here is wrong (0) but we don't care for highlighting parsing
            ITextBufferReader reader = new StringBuilderTextBufferReader(document.GetCoreTextBuffer(), 0, parseStartOffset);

            IToken token;

            // Notify the parse target that parsing is starting
            parseTarget.OnPreParse(parseStartOffset);

            // Loop and generate all the tokens... this is optimized for incremental parsing
            while (!reader.IsAtEnd)
            {
                // Get the next token
                token = lexicalParser.GetNextToken(reader, lexicalStateID);
                lexicalStateID = token.LexicalStateID;

                // Update the parse target
                if (parseTarget.OnTokenParsed(token, reader.Offset - token.StartOffset))
                {
                    if (token.StartOffset < modifiedStartOffset)
                        modifiedStartOffset = token.StartOffset;
                    if (reader.Offset > modifiedEndOffset)
                        modifiedEndOffset = reader.Offset;
                }
                else
                {
                    // Quit the loop if nothing was changed
                    if (reader.Offset >= parseThroughOffset)
                        break;
                }
            }

            // Notify the parse target that parsing is complete
            parseTarget.OnPostParse();

            return new TextRange(modifiedStartOffset, modifiedEndOffset);
        }

This code is from Actipro's example code that they sent me.

As for the NonMergableTokenBase, it's in ActiproSoftware.SyntaxEditor namespace after around build 240 or so, I think. They added it pretty recently (last several months or so I think).

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by Werdna

Thanks Kelly. This sample does the trick.
I think I have some older version (4.0.0236) and (1.0.0084 for shared), and my version has example of PerformLexicalParse that does not have the backtracking.
I also don't have NonMergableToken in my version.

The latest build of this product (v25.1.0) was released 2 months ago, which was after the last post in this thread.

Comments (11)

Add Comment