nested multiline comments in SQL?

Comments (6)

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

You might be able to do that if you have the comment state have a child state of itself so that recursion occurs.

Actipro Software Support

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Karl,

You really sure you want to do that? Reason I ask is that I've never seen another language that supports nested comments - I don't think most SQL implementations do either.

Are you using a dynamic language? if not, it's easy... If you are, I can't help you (but somebody else will).

To handle this in a non-dynamic language, you handle it with a stack or "counter" for the multiline comment lexer routine.

For instance, let's say your lexer looks like (from SimpleLanguage example):


public MatchType GetNextTokenLexicalParseData(ITextBufferReader reader, ILexicalState lexicalState, ref ITokenLexicalParseData lexicalParseData) {
    // Initialize
    int tokenID = SimpleTokenID.Invalid;

    // Get the next character
    char ch = reader.Read();

    // If the character is a letter or digit...
    if ((Char.IsLetter(ch) || (ch == '_'))) {
        // Parse the identifier
        tokenID = this.ParseIdentifier(reader, ch);
    }
    else if ((ch != '\n') && (Char.IsWhiteSpace(ch))) {
        while ((reader.Peek() != '\n') && (Char.IsWhiteSpace(reader.Peek()))) 
            reader.Read();
        tokenID = SimpleTokenID.Whitespace;
    }
    else {
        tokenID = SimpleTokenID.Invalid;
        switch (ch) {
            case ',':
                tokenID = SimpleTokenID.Comma;
                break;
            case '(':
                tokenID = SimpleTokenID.OpenParenthesis;
                break;
            case ')':
                tokenID = SimpleTokenID.CloseParenthesis;
                break;
            case ';':
                tokenID = SimpleTokenID.SemiColon;
                break;
            case '\n':
                // Line terminator
                tokenID = SimpleTokenID.LineTerminator;
                break;
            case '{':
                tokenID = SimpleTokenID.OpenCurlyBrace;
                break;
            case '}':
                tokenID = SimpleTokenID.CloseCurlyBrace;
                break;
            case '/':                        
                tokenID = SimpleTokenID.Division;
                switch (reader.Peek()) {
                    case '/':
                        // Parse a single-line comment
                        tokenID = this.ParseSingleLineComment(reader);
                        break;
                    case '*':
                        // Parse a multi-line comment
                        tokenID = this.ParseMultiLineComment(reader);
                        break;
                }
                break;
            case '=':
                if (reader.Peek() == '=') {
                    reader.Read();
                    tokenID = SimpleTokenID.Equality;
                }
                else
                    tokenID = SimpleTokenID.Assignment;
                break;
            case '!':
                if (reader.Peek() == '=') {
                    reader.Read();
                    tokenID = SimpleTokenID.Inequality;
                }
                break;
            case '+':
                tokenID = SimpleTokenID.Addition;
                break;
            case '-':
                tokenID = SimpleTokenID.Subtraction;
                break;
            case '*':
                tokenID = SimpleTokenID.Multiplication;
                break;
            default:
                if ((ch >= '0') && (ch <= '9')) {
                    // Parse the number
                    tokenID = this.ParseNumber(reader, ch);
                }
                break;
        }
    }

    if (tokenID != SimpleTokenID.Invalid) {
        lexicalParseData = new LexicalStateAndIDTokenLexicalParseData(lexicalState, (byte)tokenID);
        return MatchType.ExactMatch;
    }
    else {
        reader.ReadReverse();
        return MatchType.NoMatch;
    }
}

with the "this.ParseMultiLineComment(ITextBufferReader reader)" implemented as


protected virtual int ParseMultiLineComment(ITextBufferReader reader) {
    reader.Read();
    while (reader.Offset < reader.Length) {
        if (reader.Peek() == '*') {
            if (reader.Offset + 1 < reader.Length) {
                if (reader.Peek(2) == '/') {
                    reader.Read();
                    reader.Read();
                    break;
                }
            }
            else {
                reader.Read();
                break;
            }
        }
        reader.Read();
    }
    return SimpleTokenID.MultiLineComment;
}

You can change your lexer to handle it by making ParseMultiLineComment a bit smarter...


protected virtual int ParseMultiLineComment(ITextBufferReader reader) {
    // keep track of depth...
    int depth = 1;

    // consume the opening *
    reader.Read();
    while (!reader.IsAtEnd)
    {
        char ch = reader.Peek();
        if (ch == '/')
        {
            // always consume the char (we need progress in any case)
            reader.Read();
            // don't read past EOF (assume they haven't finished the comment yet)
            if (reader.IsAtEnd)
                return SimpleTokenID.MultiLineComment;
            // look for another nested comment
            if (reader.Peek() == '*')
            {
                // consume the *
                reader.Read();
                // we're one deeper now.
                depth++;
            }
        }
        else if (ch == '*')
        {
            // always consume the char (we need progress in any case)
            reader.Read();
            // don't read past EOF (assume they haven't finished the comment yet)
            if (reader.IsAtEnd)
                return SimpleTokenID.MultiLineComment;
            // look for a close comment
            if (reader.Peek() == '/')
            {
                // consume the '/'
                reader.Read();
                // we're one shallower now.
                depth--;
                // if we are back to zero, we've read the entire multiline nested comment.
                if (depth == 0)
                    return SimpleTokenID.MultiLineComment;
            }
        }
        else
            reader.Read();
    }
    return SimpleTokenID.MultiLineComment;
}

This code is untested, but I'm pretty sure it'll work properly... It may need a few simple changes though - feel free to ask if you run into trouble with it.

[Modified at 03/09/2007 03:36 PM]

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by Kelly Leahy - Software Architect, Milliman

Ok... I just tested that code in the demo SDI editor application.

It works.

Kelly Leahy Software Architect Milliman, USA

Posted 18 years ago by Karl Grambow

Hi Kelly,

Thank you so much for the effort spent in providing the example and in replying to my post. Unfortunately I'm not using the lexer. Effectively I'm just loading the language from XML and that's about it.

You are right, it is a bit of a weird implementation and the only place I've seen it used is in SQL Server Management Studio for SQL Server 2005. It's not essential but I was just curious if it was possible (easily so) in SyntaxEditor.

Regarding Actipro Support's initial response. I think that the comment state already has a child state of itself as this is how it comes in the supplied SQL language definition file.

Here's part of the definition file.




<!-- Code -->
<State Key="DefaultState">

............

    <ChildStates>
        <ChildState Key="MultiLineCommentState" />
    </ChildStates>
</State>

<!-- MultiLine Comments -->
<State Key="MultiLineCommentState" TokenKey="MultiLineCommentDefaultToken" Style="CommentDefaultStyle">
<!-- Scopes -->
<Scopes>
    <Scope BracketHighlight="True">
        <ExplicitPatternGroup Type="StartScope" TokenKey="MultiLineCommentStartToken" Style="CommentDelimiterStyle" PatternValue="/*" />
        <ExplicitPatternGroup Type="EndScope" TokenKey="MultiLineCommentEndToken" Style="CommentDelimiterStyle" PatternValue="*/" />    
    </Scope>
</Scopes>

Posted 18 years ago by Actipro Software Support - Cleveland, OH, USA

Hi Karl,

I meant to add the MultiLineCommentState as a child state of MultiLineCommentState. Then also you need to configure MultiLineCommentDefaultToken to break on /. If you do that it works:


<State Key="MultiLineCommentState" TokenKey="MultiLineCommentDefaultToken" Style="CommentDefaultStyle">
    <!-- Scopes -->
    <Scopes>
        <Scope BracketHighlight="True">
            <ExplicitPatternGroup Type="StartScope" TokenKey="MultiLineCommentStartToken" Style="CommentDelimiterStyle" PatternValue="/*" />
            <ExplicitPatternGroup Type="EndScope" TokenKey="MultiLineCommentEndToken" Style="CommentDelimiterStyle" PatternValue="*/" />    
        </Scope>
    </Scopes>
    <!-- Patterns Groups -->
    <PatternGroups>
        <RegexPatternGroup TokenKey="MultiLineCommentDefaultToken" PatternValue="[^\*\/]+" />
    </PatternGroups>
    <ChildStates>
        <ChildState Key="MultiLineCommentState" />
    </ChildStates>
</State>

Actipro Software Support

Posted 18 years ago by Karl Grambow

That's perfect!

I had tried adding the child state to the MultiLineCommentState but it didn't work - until I configured MultiLineCommentDefaultToken to break on /, as you suggested.

Thanks a lot,

Karl

The latest build of this product (v25.1.0) was released 1 month ago, which was after the last post in this thread.

Comments (6)

Add Comment