Grammar ambiguity

Comments (17)

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

Not sure if this helps but one thing I see with that code you pasted is that you have the potential to have nothing matched, which is bad. By that I mean you have three optional blocks ([...]). I would separate them out into an alternative like this so one or the other will match something:

(
    'OpenParenthesis'
    [ "ArgumentList<@ out argumentList @>" ] 
    'CloseParenthesis'
)
| (
    "ArgumentList<@ out argumentList @>"
)

Actipro Software Support

Posted 18 years ago by Vincent Parrett

That makes sense, unfortunately that doesn't solve the problem, the parser generator reports that there are now multiple non terminals which match on OpenParenthisis. The problem is that the InvocationExpression that I posted before is referenced from the PrimaryExpression NonTerminal, which also has a case for ParenthesizedExpression. This is where the language ambiguity makes life difficult.


                | (
                    <!-- ParenthesizedExpression -->
          
                    'OpenParenthesis'
                    "Expression<@ out expression @>"
                    'CloseParenthesis'
                    <%
                        expression = new ParenthesizedExpression(expression, new TextRange(startOffset, this.Token.EndOffset));
                    %>                    
                )
....
| "InvocationExpression<@ ref expression @>"

I guess if the parser knew that the previous identifier token was a method reference then it could make a decision, but the problem is the referenced method may not yet have been parsed. I never really liked VBScript as a language... but I'm beginning to dislike it a lot!

Regards

Vincent

[Modified at 07/26/2007 05:23 PM]

[Modified at 07/26/2007 05:23 PM]

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

Yes, sometimes languages like this can be very challenging in certain parts. A lot of times when we run into scenarios like this, we use a peek operation. By this I mean we would make a method named something like IsInvocationExpression and that method would start a peek operation, scan forward through the tokens by repeated calls to Peek, and then determine if the proper tokens are there to make it an invocation expression, then end the peek operation. And the IsInvocationExpression method call would become the condition for the InvocationExpression non-terminal. This way, if you put the reference to the InvocationExpression non-terminal before the reference to ParenthesizedExpression, it will run your custom method first and fall into that when appropriate. If it doesn't make the IsInvocationExpression, then it will default to run your ParenthesizedExpression non-terminal.

Actipro Software Support

Posted 18 years ago by Vincent Parrett

Thanks, sorry for the delay in replying, been side tracked getting ready for TechEd Australia next week (we're exhibiting) so haven't had a chance to try this yet. I'll post again when I have a chance to try it.

Regards

Vincent.

Posted 18 years ago by Vincent Parrett

Hi

I tried the custom method, however I hit another snag


Production 'InvocationExpression' contains an alternation that has multiple references to the terminal 'OpenParenthesis', either directly or via a non-terminal first set.

The alternation node's stack within the production was:
Concatenation [3 children] - First child: UserCode [// NOTE: The real invocation e])
Alternation [3 children] - First child: Concatenation [3 children] - First child: 'OpenParenthesis' (Terminal)))

The first alternation option node with the terminal was:
Concatenation [3 children] - First child: 'OpenParenthesis' (Terminal))

The second alternation option node with the terminal was:
Concatenation [1 children] - First child: "ArgumentList" (NonTerminal))

The problem is this bit :



    (
       'OpenParenthesis'
       [ "ArgumentList<@ out argumentList @>" ] 
       'CloseParenthesis'
    )
    | (
       "ArgumentList<@ out argumentList @>"
    )
    | "StatementTerminator<- ->"

ArgumentList will eventually hit the PrimaryExpression non terminal which then references OpenParenthesis.

Regards

Vincent.

Posted 18 years ago by Vincent Parrett

More on this... I found a possible way around :


    <NonTerminal Key="InvocationExpression" Parameters="bool callFound, ref Expression expression">
      <AdditionalConditions>
        <ExpressionCondition>this.IsInvocationExpression(callFound)</ExpressionCondition>
      </AdditionalConditions>
      <Production>
        <![CDATA[
                <%
                    // NOTE: The real invocation expression parses an Expression first but it is up to the caller to supply that
                    int startOffset = expression.StartOffset;
                    expression = new InvocationExpression(expression);
                    expression.StartOffset = startOffset;
                    AstNodeList argumentList = null;
                    bool hasParenthisis = false;
                %>
        
                (
                    ['OpenParenthesis<+ hasParenthisis = true; +>']
                     [ "ArgumentList<@ out argumentList @>" ]
                    [<? hasParenthisis ?>'CloseParenthesis']
                )
             | "StatementTerminator<- ->"
        
                <%
                    if (argumentList != null)
                        ((InvocationExpression)expression).Arguments.AddRange(argumentList.ToArray());
                    expression.EndOffset = this.Token.EndOffset;
                %>
            ]]>
      </Production>
    </NonTerminal>

Unfortunately it cannot resolve this test case properly :


Function Add(x, y)
Add = x + y
End Function

Sub Test
dim a,b

a = Add(b,2)
End Sub

b always get treated as an InvocationExpression, which in this case it's not. It seems to me to be impossible to know whether a identifier is just a simplename or a method invocation of a method that has no parameters.

Regards

Vincent

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

I haven't looked at the VBS grammar but I believe an identifier would only be a method invocation start if it is followed by optional whitespace and then a '(' or by an identifier or maybe several keywords. If it is followed by a ',' or a ')', etc. then you can assume it's a simple name. Right?

Actipro Software Support

Posted 18 years ago by Vincent Parrett

That's how any sane person would specify it, unfortunately thats not the case :(

Some examples



Function ANumber
  ANumber = 2
End Function

Sub ShowMessage(line1,line2)
  msgbox line1 & vbCRLF & line2
End Sub 
 

Function Add(a,b)
 Add = a + b
End Function

Sub Main
  Dim a
  a = ANumber + 1 
  msgbox "The result is" & a  
  a = Add(1,2) 
  a = Add(ANumber,1)
  Add 1,2  
  Call Add(2,2) 'Call required if parenthesis used
  ShowMessage "Line1","Line2"
  Call ShowMessage("Line1","Line2")

End Sub

That should give you the idea. The cases where Parenthesis' are used are relatively easy to detect, the cases where Parenthesis are not used are damned near impossible at parse time.

I think I am just going to have to not worry about resolving method invocations were Parenthesis are not used, at least not in the parser. Hopefully I can resolve them when producing the Intellisense info. I haven't looked, but perhaps there might be a way to iterate the AST (or a portion of it, ie the blockstatement of a method) and replace any SimpleNames with InvocationExpressions if they resolve to a method name (just thinking out loud here... )

I found this interesting article on the vagaries of VBScript :
http://adamv.com/dev/articles/hatevbs/vbscript

Regards

Vincent.

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

I still think what I said was accurate though. Perhaps you can show me two statements where that sort of logic doesn't work and is ambiguous.

Let me show examples...

a = ANumber + 1

Here 'ANumber' is not an invocation because it is followed by an operator. So it is a simple name.

Add 1,2

'Add' is followed by a number so that would make it an invocation.

ShowMessage "Line1","Line2"

'ShowMessage' is followed by a string so that would make it an invocation.

There are a certain set of tokens that can follow the identifier for an invocation and a certain set that would follow a simple name. Make sense?

Actipro Software Support

Posted 18 years ago by Vincent Parrett

In my example ANumber is a function with no parameters, so I would have thought it should be an Invocation.. but what you say also makes sense. I now have it working as you suggested, this should hopefully be ok when I get to the intellisense stuff.

Thanks for the help!

Regards

Vincent.

[Modified at 07/31/2007 07:39 PM]

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Sorry to burst your bubble guys, but the following is perfectly valid code:


function a()
  a = 3
end function

msgbox a + 3

Here 'a' is an invocation in it's usage in the msgbox call and is followed by an operator.

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by Vincent Parrett

Hi Kelly

You are of course correct, unfortunately I cannot think of any way to detect this using the grammar. The problem is that there is no way to guarantee that the parser will have parsed the function a when it parses the msgbox expression (the function may be declared later, or it could be a built in function). I decided to ignore those sort of invocations at parse time and deal with them while working out the context for intellisense. I don't have that all working yet, haven't had much time to work on it lately.

I am open to suggestions if you know of a way to detect these sort of invocations.

Regards

Vincent

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Vincent,

You're correct - there isn't, in fact, any general way in a CFG to distinguish between these invocations and normal variable references.

As a matter of fact, I treat all variable references as invocations in the grammar for my language for exactly this reason, since functions with no arguments look the same as variable references.

My question is: why do you need to know the difference between a variable reference and an invocation? Why not treat them as one and the same?

What I do in my case, is I have a "InvocationExpression" expression node in my AST that has an optional argument list, and a "Identifier" as one of it's children. The "Identifier" node has an IdentifierType enum field, and this field tells me which type of identifier it is. Initially all new Identifier nodes receive a identifier type of IdentifierType.UnknownId. Once my semantic parse is complete, I walk the AST and change the IdentifierType.UnknownId's to the other "IdentifierType" values based on the context (i.e. what functions are declared, what variables are declared, etc.).

Feel free to follow up if you want to use this sort of technique and have other questions.

Kelly

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by Vincent Parrett

The main reason I want to detect these types of invocations is for the intellisense parameter information. VBScript allows/requires you to specify the parameter list without parenthesis :

Add 1,2

or

Call Add(1,2)

or

x = Add(1,2)

It's the first case I'm trying to detect. I had considered walking the AST and modifying it after parsing completed... what sort of performance impact does this have?

Regards

Vincent.

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

So when you say the 'intellisense parameter information', you mean when the caret is on the end of the space after 'Add', you want to show the quickinfo that identifies the signature of the function and show the parameter description?

In this case, you need to know that 'add' is a function, right? You don't really care what '1' is, right?

Does VBScript require that functions be declared prior to their invocations? I can't remember the scoping rules of VBScript.

If it requires that declarations must precede invocations, you can handle this in the lexer, rather than the parser, by using a symbol table modified by the parser. However, I'm not sure it'll work.

Since you're working in VBScript and VBScript doesn't support overloading of function names, you could keep a symbol table of functions that are declared and treat them as function invocations whenever you see them in the lexer.

I'm not sure the best approach to do what I'm referring to. One way (quick and dirty) would be to scan the lines looking for lines that look like 'sub ...' and grab the ID to the right of 'sub' keyword. If you then put them in a list, and make a second pass setting some property of the token to let you know that they are function IDs, it's then easy to do the intelliprompt stuff by just looking to see if the token to the left has this property set.

Notice that the 'goofy' case is the only difficult one (the "Add 1,2" case) The others are easy. Also, I don't think a 'function' can be called with this syntax. You should verify, but I'm pretty sure I'm right on this.

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by Vincent Parrett

Quote:
So when you say the 'intellisense parameter information', you mean when the caret is on the end of the space after 'Add', you want to show the quickinfo that identifies the signature of the function and show the parameter description?

In this case, you need to know that 'add' is a function, right? You don't really care what '1' is, right?

Yes, that's correct.

Quote:
Does VBScript require that functions be declared prior to their invocations? I can't remember the scoping rules of VBScript.

No, it doesn't require they be declared before they are referenced. I just did a simple test to confirm :

DoSomething("hello world")

sub Dosomething(value)
msgbox value
end sub

the script works fine, even though the sub is declared after the reference.

I am making some headway with my parser, I even got the lexer to know when an identifier is a built in VBScript function or constant or instance (like Debug and Err) and so can highlight them differently. I will try the post parse processing idea when I get a chance.

Thanks

Vincent.

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

There should be some way to trigger an action once a lexical parse is complete. I would recommend you look there first - if what you need can be done with a simple pass through the tokens (or more than one pass) rather than using the semantic parse data, you'll be better off. Ultimately, however, you'll need to use the semantic parse results in order to get arguments and stuff, unless you do some really heavy processing of the lexical parse results.

Kelly Leahy Software Architect Milliman, USA

The latest build of this product (v25.1.0) was released 1 month ago, which was after the last post in this thread.

Comments (17)

Add Comment