Semantic parsing in dynamic language

SyntaxEditor for Windows Forms Forum

Posted 15 years ago by Civa
Version: 4.0.0281
Avatar
Hi again,

I have an uncommon question and obviously didn't understand how Semantic and Lexical parsing work in case of Dynamic language.Thanks for post from yesterday I'll try to solve it later :).Lets consider following situation :

1.I have built dynamic VB like syntax language (using XML document)
2.I want to do Lexical parsing similar to this example code :

base.PerformLexicalParse(doc, new TextRange(0, doc.Length), target);
just to have my tokens back from the method call (is this right?Am I getting tokens from calling this method?)

3.Now if get tokens back when I call PerformLexicalParse I can call PerformSemanticParse
like this :

public override void PerformSemanticParse(Document document, TextRange parseTextRange, SemanticParseFlags flags)
        {
            rq = new SemanticParserServiceRequest(SemanticParserServiceRequest.HighPriority,
                document, parseTextRange, flags, this, document);

            SemanticParserService.Parse(rq);

        }
4.Now I already implemented interface ISemanticParseDataTarget in my form (this is just for test proposes only I'll migrate it to some Class Library :) ) and I expect to breakpoint in method :

public void NotifySemanticParseComplete(SemanticParserServiceRequest request)
        {
            ICompilationUnit compilationUnit = (ICompilationUnit)request.SemanticParseData;
        }
be triggered (and it is) but in body of method there is nothing usefull for me because compilationUnit is always null.

I'm doing this because I want to map this VB like syntax to C# code in order to make some assembly in runtime or whatever but point is I cant get compilation unit != null and furthermore Ast from compilation unit.

Please tell me if I am right for this simple scenario here.Should I change something in my code to make it work?

By the way I want this process of Lexical + Semantic parsing start when I click on some button on form but it is not so important right now.


I already read post located here :
http://www.actiprosoftware.com/Support/Forums/ViewForumTopic.aspx?ForumTopicID=3649 but I don't understand how to actually call some kind of custom Semantic parser (in case of this post it is XmlSemanticParser defined in method

protected override object PerformSemanticParse(MergableLexicalParserManager manager)
).

Otherwise if steps above are not applicable in my case I just wonder how can I get simple output or result when doing Lexical + Semantic parsing in runtime at certain moment?

It would be really nice to nice if you can provide me some short code snippets in steps how can I achieve this.One more time THIS IS IN CASE OF DYNAMIC LANGUAGE!

Thanks for help again in advance.

Best regards,

Civa

Comments (4)

Posted 15 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Civa,

I see that you have built a lexical parser via the dynamic XML definition and have that loaded, which tokenizes the text. The tokens are available via Document.Tokens.

However what I'm not sure about is if you have defined a semantic parser or not. Meaning, did you use our Grammar Designer to build a grammar (like what is demoed in the "Simple" language sample) and make a semantic parser class? Based on your post, it seems like you are doing all the wiring up to call a semantic parser, but your language may not have one to call, thus the compilation unit result is null.

By the way, if you are using a dynamic language, you shouldn't need to override PerformLexicalParser at all. The semantic parser classes that are generated by our Grammar Designer get to the tokens internally.

The key thing to remember is that a dynamic language basically only means that its lexical parser uses our pattern-based engine. Almost everything else about a dynamic language is the same as any other lower-level language. Semantic parsers can be added to a dynamic language just like they can to other languages. Our advanced XML language in the Web add-on is a dynamic language that has a semantic parser attached that we built using the Grammar Designer. It is attached via the info described in that other post you read.

Hope this helps a bit.


Actipro Software Support

Posted 15 years ago by Civa
Avatar
Yes I can see that from the thread I have already examined which link is in my question.So, that custom XML language is defined as Dynamic right?And there can be support for custom Semantic Parser which means I should code several classes which should do all the work and display syntax error in my editor?

Is my opinion right?Which further means if I lets say defined VB like syntax dynamic language what is the minimum of code I should write to achieve this (define custom semantic parser in my case for dynamic language)?

On the other side if I try to make non-dynamic language I definitely didn't realize how even to insert simple keyword "As" which can be used in variable declaration after which should be data type like Integer or String etc.

Consider the following :

Dim a As String;
I want to reach the state where syntax editor (SemanticParserService) tells me that this expression (or better say statement) is ok at all and there are no syntax errors.

What I had at the begining of building non-dynamic syntax language?
Well I had only small very,very reduced VB syntax language definition made of classes from SimpleSyntaxLanguage provided in solution with downloaded component intending to add my own keywords but still have some elements of VB language.So I did some transformations and replacements and now I expect to see my little VB like syntax language in action.

Im missing something because I really can not modify SimpleLexicalParser.cs in order to work for me.In every case error is the same "Statement expected." when I type expression like above in my code editor.

I think I need some serious code snippet support since I didnt understand how all of the stuff provided work.Please if you can give me small explanation how can make above expression work with no errors.Here are the code snippets important for extending functionality of simple language

1.

protected virtual bool MatchStatement(out Statement statement) 
        {
            statement = null;

            System.Int32 startOffset = this.LookAheadToken.StartOffset;
            if (this.TokenIs(this.LookAheadToken, BRScriptTokenID.OpenCurlyBrace)) 
            {
                if (!this.MatchBlock(out statement))
                    return false;
            }
            else if (this.TokenIs(this.LookAheadToken, BRScriptTokenID.SemiColon)) 
            {
                if (!this.Match(BRScriptTokenID.SemiColon))
                    return false;
                statement = new EmptyStatement(this.Token.TextRange);
            }
            else if (this.TokenIs(this.LookAheadToken, BRScriptTokenID.Dim)) 
            {
                Identifier variableName = null;
                DataType dType = null;

                if (!this.Match(BRScriptTokenID.Dim))
                    return false;
                if (!this.MatchIdentifier(out variableName))
                    return false;
                if(!this.Match(BRScriptTokenID.As))
                    return false;
                if (!this.MatchDataType(out dType))
                    return false;

                this.Match(BRScriptTokenID.SemiColon);

                statement = new VariableDeclarationStatement(variableName,dType,new TextRange(startOffset, this.Token.EndOffset));
            }
            else  if (this.TokenIs(this.LookAheadToken, BRScriptTokenID.Identifier)) 
            {
                Identifier variableName = null;
                Expression expression = null;
                if (!this.MatchIdentifier(out variableName))
                    return false;
                if (!this.Match(BRScriptTokenID.Assignment))
                    return false;
                if (!this.MatchExpression(out expression))
                    return false;
                this.Match(BRScriptTokenID.SemiColon);
                statement = new AssignmentStatement(variableName, expression, new TextRange(startOffset, this.Token.EndOffset));
            }
            else if (this.TokenIs(this.LookAheadToken, BRScriptTokenID.Return)) 
            {
                Expression expression = null;
                if (!this.Match(BRScriptTokenID.Return))
                    return false;
                if (!this.MatchExpression(out expression))
                    return false;
                this.Match(BRScriptTokenID.SemiColon);
                statement = new ReturnStatement(expression, new TextRange(startOffset, this.Token.EndOffset));
            }
            else
                return false;
            return true;
        }
This is the place where I tried to inject my own AstNode named DataType in order to avoid syntax errors like in variable declaration above.

Should I focus on this method or some other else?What should I do to make VariableDeclarationStatement class accept additional keyword "As" followed by data type instead of just "Dim a;" or "Dim b;"

Also added some tokens to SimpleTokenID class in order to make it highlight correctly.



So lets summarize this.I tried both approaches for defining simple 3 or more keyword language (Dynamic one and non-dynamic) and in first case I succeeded with creating nice highlighting but nothing else and in other case I have nice semantic parsing but with no possibilities of adding my own keywords and sometimes no highlighting at all (even if I define target word as keyword.)

Im really confused and I really stuck with this and your help would be very very grateful.

I apologize for so many questions for maybe not so hard implementation of everything related to writing custom language but it is very important to me and I couldn't get much info from documentation and examples provided.

Thank you one more time and I hope I'll get this working and have no more questions :)

Best regards,

Civa
Posted 15 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Civa,

First, for syntax error highlighting, that requires the use of a semantic parser. A dynamic language just means that the language uses our custom pattern-based format for defining a lexical parser. The lexical parser tokenizes text and provides for syntax highlighting. But it doesn't do anything related to finding errors.

A semantic parser examines the tokens and text and can do things like build an AST and find syntax errors. But it is important to know that you cannot accurately know syntax errors without a full blown semantic parser implementation. So a dynamic language XML definition standalone won't get you syntax errors.

Our "Simple" language sample is a great reference to look at because it shows a programmatic lexical parser, a semantic parser, AST building via the semantic parser, outlining based on the AST, automated IntelliPrompt, and syntax error functionality is built-in. So it covers the entire spectrum of things you'd like to do.

Now one issue customers have had with that sample is that we don't have a good step by step walkthrough on these features. This is something we are working on for future versions, coming up with more of a step by step guide of what you do in the order you should do it to achieve that end result.

Anyhow that being said, while the Simple language uses a programmatic lexical parser instead of a dynamic one, you can still do all the same features with a dynamic language. The only thing is that your dynamic language XML definition will get a lot more complex because you'll need each keyword, etc. to have its own unique ID, meaning you can't have groups of keywords. This is because semantic parsers need to recognize individual keywords, operators, etc. and they do that via IDs.

Hope this helps. Just recognize though that making a complete grammar for complex languages can take a decent amount of time.

From your post, I believe you were under the impression that dynamic languages generate the lexical and semantic code, which at this time is not true. In SyntaxEditor they are separate processes. The dynamic XML definition just defines patterns for syntax highlighting. Something like ANTLR is probably more what you mean however ANTLR doesn't support incremental lexical parsing, which is essential for a code editor control.

That all being said, in our WPF version that we are using to prototype out a next generation framework, we have some exciting optional integration with Microsoft MGrammar, which is a parser generator similar to ANTLR. And our integration does allow incremental parsing as well as the entire language defined in a single grammar file. But that won't really help you for a while since you need this for WinForms now.

So to sum up, programmatic lexical parsers and dynamic XML definitions are just two ways to define the lexical-only part of a language. The semantic parser part can be built using our Grammar Designer and reads the tokens your language's lexical parser feeds it. The semantic parser is what helps generate syntax errors, advanced outlining, AST, and more.


Actipro Software Support

Posted 15 years ago by Civa
Avatar
Ok thank you very much for your answer I'll consider a couple of aspects of building my language and making it work.

It was just a matter of understanding lexical and semantic parsers both in dynamic and non dynamic languages and I think I get it now pretty clear.

Thank you once again and I hope I'll have no more questions related to my little project :)

Best regards,


Civa
The latest build of this product (v24.1.0) was released 2 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.