outlining for non-AST nodes

Comments (16)

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

Yes, as you know the comments are probably being filtered out in your MergableRecursiveDescentLexicalParser implementation. What we do in our .NET Languages Add-on is track the TextRange of a multi-line comment when we encounter one. We add it to a collection defined on our MergableRecursiveDescentLexicalParser implementation.

Then at the end of the semantic parser's CompilationUnit we do some code like this (in the C# language):

// Get the comment and region text ranges
if (this.LexicalParser is CSharpRecursiveDescentLexicalParser) {
    compilationUnit.DocumentationCommentTextRanges = ((CSharpRecursiveDescentLexicalParser)this.LexicalParser).DocumentationCommentTextRanges;
    compilationUnit.MultiLineCommentTextRanges = ((CSharpRecursiveDescentLexicalParser)this.LexicalParser).MultiLineCommentTextRanges;
    compilationUnit.RegionTextRanges = ((CSharpRecursiveDescentLexicalParser)this.LexicalParser).RegionTextRanges;
}

That is moving the text ranges to the CompilationUnit.

Then in our CompilationUnit, we implement this method:


/// <summary>
/// Adds any extra <see cref="CollapsibleNodeOutliningParserData"/> nodes to the <see cref="CollapsibleNodeOutliningParser"/>,
/// such as for comments that should be marked as collapsible.
/// </summary>
/// <param name="outliningParser">The <see cref="CollapsibleNodeOutliningParser"/> to update.</param>
void ICompilationUnit.UpdateOutliningParser(CollapsibleNodeOutliningParser outliningParser) {
    if (documentationCommentTextRanges != null) {
        foreach (TextRange textRange in documentationCommentTextRanges) {
            Comment collapsibleNode = new Comment(CommentType.Documentation, textRange, null);
            outliningParser.Add(new CollapsibleNodeOutliningParserData(textRange.StartOffset, OutliningNodeAction.Start, collapsibleNode));
            outliningParser.Add(new CollapsibleNodeOutliningParserData(textRange.EndOffset - 1, OutliningNodeAction.End, collapsibleNode));
        }
    }
    if (multiLineCommentTextRanges != null) {
        foreach (TextRange textRange in multiLineCommentTextRanges) {
            Comment collapsibleNode = new Comment(CommentType.MultiLine, textRange, null);
            outliningParser.Add(new CollapsibleNodeOutliningParserData(textRange.StartOffset, OutliningNodeAction.Start, collapsibleNode));
            outliningParser.Add(new CollapsibleNodeOutliningParserData(textRange.EndOffset - 1, OutliningNodeAction.End, collapsibleNode));
        }
    }
    if (regionTextRanges != null) {
        foreach (TextRange textRange in regionTextRanges) {
            RegionPreProcessorDirective collapsibleNode = new RegionPreProcessorDirective(textRange);
            outliningParser.Add(new CollapsibleNodeOutliningParserData(textRange.StartOffset, OutliningNodeAction.Start, collapsibleNode));
            outliningParser.Add(new CollapsibleNodeOutliningParserData(textRange.EndOffset - 1, OutliningNodeAction.End, collapsibleNode));
        }
    }
}

So doing those 3 steps will get you outlining. Hope that helps!

Actipro Software Support

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Very elegant...

Thanks.

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by pranay

Hi Kelly,
Can you give me some pointers reagrding how to get outlining? Your reply to the post forced me to ask you for some help. :)
Kind regards
Pranay

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Pranay,

Can you tell me what you're having problems with?

The easiest way to get outlining is to use automatic outlining and ICollapsibleNode interface on your AST-nodes that need outlining. However, in cases (like comments) where you CAN'T have an AST node (due to comments being filtered out by the lexer filter -- the thing between the lexer and the parser that in Actipro's stuff is referred to as the recursiveDescentLexicalParser, I think), you must use the technique described above (override the ICompilationUnit.UpdateOutliningParser method on your root AST node).

I would recommend you start with the help document's information on outlining and if you still have problems, ask here.

I'm assuming you're using a non-dynamic language. If you're planning to use a dynamic language (XML language definition), I'll be of little help - I don't use those for performance reasons.

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by pranay

Hi Kelly,
Thanks for the reply. I'm using non-dynamic language. I made an AST node for the comments as well :(, now I have a reason why outlining event wasn't firing for the comments. I will try it tomorrow, if failed will be back :).
Kind regards
Pranay.

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Pranay,

When you say you made a AST node for the comments, how were you adding it to the AST? Are you receiving comments as "tokens" in your parser? If so, your language definition must either be very restrictive as to where comments may appear, or you must have a VERY complicated language definition in order to support comments scattered anywhere. I'm not even sure it would be possible to construct a LL(k) language that supports comment tokens in any position. That must be ambiguous, I'd think, though I've never tried it.

Have you managed to get any outlining working? I would recommend you first concentrate on "true" AST nodes getting outlining working, and then try to concentrate on comments later, as they are the exceptional case in most languages.

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Pranay,

Yes Kelly is right, focus on regular AST node outlining first and once you have that working, then move to things like comment outlining since that is more complex. Did you get the basic AST node outlining for classes, etc. working?

Actipro Software Support

Posted 18 years ago by pranay

Hi,
Thanks for making me realize that I was doing something terrible. Naively, I assumed that making an AST node for the comments will be easier that's why I made this horrible choice. Now I have started making AST nodes for other expressions and declarations so outlining is something, that will come after few days. I think I should enable error highlighting before moving to outlining.
Thanks once again for saving me :).
Kind regards
Pranay.

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Pranay,

No problem - glad we could help. Let us know when you've gotten over that hurdle and want to get back to outlining. As you're looking at doing your AST node design, try to think about which nodes you want to use for outlining (i.e. which nodes represent blocks of code that should be collapsible). It'll help you later if you've figured out that list and have nodes at those levels so that it's easier to implement ICollapsibleNode on them.

Also, don't get too bogged down creating a boatload of AST nodes just because you can - for instance, you may not need to create nodes for expressions and the stuff under that if you don't need them later. As an example, I started out creating a "full" AST for my language but then realized I only needed nodes for the very highest level of structure and for identifier references and definitions. I was able to get a bunch better performance in my parsing by not building the full AST (since I didn't need most of it anyway).

Of course, all this is relative to what you want to do with the AST - but don't think just because you can build a full AST that you need to. You probably aren't building a compiler, right?

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

Kelly that is an excellent point. When starting out with the AST, for a C-like language it's best to just stick to type/member declarations. If you can avoid going down to statement/expression levels then do so since those can become a real bear. By keeping the code simpler, you end up parsing faster and saving on memory too. Then as you require more AST nodes for your IntelliPrompt or contextual needs, add them in as appropriate. But you can do a heck of a lot of neat stuff with just knowing type/member declarations.

For our .NET Languages Add-on, we implemented the entire grammar of C# and VB, which is why it took us weeks of work to do it. :)

Actipro Software Support

Posted 18 years ago by pranay

Hi Kelly,
Thanks for the suggestion, as I was on the spree of making loads of ASTs:(. So, again I need to think which one to discard. Say, if a variable is ch is declared as unsigned char ch; and user assigns it a value like this ch = 2.3; I would like to show some warning for these kind of assignments. That's right, I'm not building a compiler :).

Kind regards
Pranay.

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Pranay,

I'm going to assume you're trying to use a mostly LL(1) grammar with a RD parser, maybe even using the parser generator supplied by Actipro.

In this case, if you want to do type checking (which is what you're referring to when you say you want a warning for assignments / initializations like ch = 2.3), you'll need some sort of symbol table maintenance so that you know when ch is declared and what it's declared as. If you're doing a .NET language, it could be difficult to do this without a full AST (and even difficult WITH a full AST) since variables don't have to be declared before they are used - at least in a line-number type of "before" concept, that is for fields and members. For locals, obviously they DO need to be declared before use.

Are you writing a parser for C#, C, or some other existing language, or are you making your own new language?

If your language requires declaration before use (in a line-number / token ordering sense), then you can do a single-pass type checking analysis for locals and build your symbol table when you see declarations, and use it to look up the types when parsing expressions.

You can then, in RD, return the "inferred type" from each of your expression productions (expression, term, factor, primary, etc.) that is computed in the manner appropriate for your language (x / y is real, int + real is real, int + int is int, etc.).

It is much less messy, however, to perform type-checking after the syntax analysis phase. In order to do this, though, your AST needs to have all the necessary information in order to compute what you need. This probably means you need to "keep" all the expression stuff in the AST.

Let me know if you need more elaboration on any of this.

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by pranay

Hi Kelly,
I'm sorry I wasn't clear in my last post. I'm designing an IDE which will use gcc for C/C++ compilation part in the background. Its true, I'm using parser generator supplied by Actipro. For now, I'm dropping the idea of type checking after realizing the amount of work it needs. Even VS doesn't do it :) for MFC.
Kind regards
Pranay.

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Wow... That's an ambitious goal.

I think it would be best if you did drop type checking for a while, since type checking C++ is very painful, even more so than many other languages. One benefit, however, of C/C++ is that it is top down - everything MUST be declared before it is used (either in an #include or in the main file itself).

That said, I'm not sure how the syntax editor's lexer framework is built to work with things like #include. My guess is that you'd need to spin off a separate lexer when you encounter things like #include that doesn't try to add tokens to the Tokens collection of the document, but still somehow passes them to the semantic parser. This might be something you can do in the lexer filter (that phase between lexing and parsing that I think is called syntax parsing by the documentation -- its the same thing that Bill refered to above for where to store the comment text ranges intermediately). For instance, when you encounter a #include directive (which I'd return the entire directive as a single token if I were you) in the lexer filter, you could tell the filter to tokenize the #include file, and return those tokens to the semantic parser until it runs out, then switch back to the stuff after the #include in the original document. Of course, this is very complex too, since you'll need to have a stack of #includes in your "other" lexer so that it can recurse.

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by pranay

Hi Kelly,
It's not that ambitious :). As, this IDE is for embedded systems where I will support the intellipromt/classview for few files provided by GCC, files provided by us and user defined files not for all the stuff that GCC has to offer :). There are IDEs which do this by hardcoding the features, whether you include the header or not it doesn't make any difference they show the intelliprompt. I just want to make mine better than those :).
Kind regards
Pranay.

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Well... If you aren't going to support arbitrary #include files, and you have control (in your situation) over which release of the files the user is using, then I would recommend some sort of precompiled header data that you provide with your product (hard coded or in some sort of data files), rather than trying to parse the data as you go. Another alternative is to take a sort of precompile caching step, like the metadata (reflection) caching that the SyntaxEditor languages pack does. You could then "preparse" the includes prior to including them, and then just "merge" the results of the parse into your semantic analysis phase, as if it were in the same source file (just without any real "tokens").

In any case, you get to roll-yer-own on this one, I think.

Kelly Leahy Software Architect Milliman, USA

The latest build of this product (v25.1.0) was released 1 month ago, which was after the last post in this thread.

Comments (16)

Add Comment