advice needed on parsing

SyntaxEditor for Windows Forms Forum

Posted 13 years ago by Karl Grambow
Avatar
Hi,

I was wondering if you could advise me as to whether I can get some better performance out of SyntaxEditor 4.0 with respect to something I'm trying to do.

Essentially, I'm parsing a document and looking for "GO" keywords on new lines. Initially I did this using regular expressions and that worked really well, and fast. This is what my regular expression looked like

        'look for any newline beginning with "GO" (allowing for white space beforehand and white space after
        Dim GOKeyWord As Regex = New Regex("(?<=\r\n\s*)(?=\b)(GO)(?<=\b\s*)", RegexOptions.IgnoreCase)
        Dim m As Match =  GOKeyWord.Match(Script)
        While m.Success
            'some code
        End While
The problem is that I now need to find "GO" keywords that are of a particular token. Specifically, I want to ignore any occurrence of the "GO" word if it occurs inside comments or quoted strings. So, what I've done is, put the Script string into a SyntaxDocument and basically just look for a specific token called "GOToken", which is defined in the language.xml file that I loaded into the SyntaxDocument.


'SyntaxDocument.Text = ObjectScript

        Dim oStream As TokenStream = SyntaxDocument.GetTokenStream(SyntaxDocument.Tokens.IndexOf(0))
        Dim oToken As IToken

        'let's get the next GOToken
        oToken = GetNextToken("GOToken", oStream) 'this method reads through the stream until it finds the supplied token

        While Not oToken Is Nothing
            'some code
        End While

This works great and does exactly what I want it to do. But it's about 40 times slower than using the regular expression (400 milliseconds as opposed to 10 milliseconds)

What I'd like to know is if if there's a better way (performance-wise) that I can achieve what I'm trying to do using SyntaxEditor.

Thanks,

Karl

Comments (3)

Posted 13 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Using regular expressions on a string are always going to be much faster than token parsing, due to how regular expressions are optimized for fast string-based parsing. You might be able to replace out the ranges in the string that are comments first in its own regular expression replace operation and then do the GO search, thereby searching for GOs in the code with all the comments removed.

What are you doing with the result of finding the GO statements, if I might ask?


Actipro Software Support

Posted 13 years ago by Karl Grambow
Avatar
Thanks for the reply.

Quote:

Using regular expressions on a string are always going to be much faster than token parsing...

I thought as much.

I'm parsing T-SQL scripts and identifying individual batch statements (these lie between GO statements) so that I can run each T-SQL batch independantly from the other.

I see what you mean about replacing out the ranges in the string that are comments first and then parse for GO statements - that might work, if a little more complicated than I'd hoped. Too long staring at regular expressions and my mind goes a bit fuzzy :).

Thanks again for suggestion.

Karl
Posted 13 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Yes for what you are doing, that is probably your fastest option. Regexs are fun! :)


Actipro Software Support

The latest build of this product (v2018.1 build 0341) was released 6 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.