Dynamic language performance issues

SyntaxEditor for Windows Forms Forum

Posted 10 years ago by Michael Dempsey - Sr. Developer, Teradata Corp
Version: 14.1.0321
Avatar

Now that my app is rolling out to customers I am starting to get reports of rather odd performance issues that were never seen with the old editor we used.

What makes them odd is that a very small change to th language definition file [these are XML defined languages] will make a huge difference to performance in one case but not in the others.

The 3 issues are:

#1 - Very large files take a long time to load/parse. (eg. 500KB)

#2 - A relatively small file [20K] that contains no carriage returns (ie. all on one line) takes a long time to load/parse, and then typing is extremely slow.
       (A 20K file with CRs every 20 to 100 characters - as is more normal - has no issues at all.)

#3 - Typing into a 'block' selection of zero width spanning 50 lines is OK for the first couple of characters but then gets slowere and slower with every extra character typed.

I accept that large files may take a long time to load/parse since the language is complex.
(5 child states, 74 Pattern Groups and over 1000 Patterns. Almost all Explicit though.)
but when I replaced just 3 simple RegEx patterns with their equivalent 22 Explicit patterns issue #1 improved by 300%.
(All patterns were defining the same TokenID) but there was no noticable improvement in issue #2 or #3.
Such a small number of patterns changed should not have such a huge impact on performance - and definately not only impacting one of the issues.

An even bigger suprise was when I removed 3 tokens (out of over 1000) from one of the Explicit pattern groups issue #2 improved by 800%!
Again this change had no noticable effect on issue #1 or #3.

There seems to be no logic to what causes performance issues.

Note that BOTH performance improvemants were made to the same TokenID, and this is the only set of tokens that uses a LookBehind pattern that starts with '^' - to match the start of the line. (This is the only thing I can see that makes these tokens different from all the others.)
The 800% improvement occurred when I removed 3 tokens which each started with a period from a set of 6 tokens - the others did not start with a period.
The set was: GOTO EXIT QUIT .GOTO .EXIT .QUIT

I suspect that the issue with the very long lines is that you reparse the entire current line +/- 1 or 2 lines every time the text changes. So in this situation you reparse the entire text after every keystroke.
I think the old editor I used only parsed 2 or 3 tokens =/- the current position on key strokes. It then parsed the current line +/- 1 or 2 when the cursor moved to a new line. (Things like AutoCasing were only done after the line number changed)
The result was that long lines made no difference at all to performance.

I have not found anything that helps issue 3 but it must be related to something in the language definition as I don't see the issue when I use the much simpler SQL language from your sample app. (It does still apply to the 'limited' language I switch to when the text exceeds 200KB. This has far fewer tokens defined than the full language but still more than your demo file.) 

Do you have a set of recommended DOs and DONTs for language definitions?

Comments (6)

Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Mike,

1) Yes the WinForms SyntaxEditor lexes the entire document initially and after lexer definition changes, which can take time when working on a large document.  Using a programmatic lexical parser can help increase speed there but is more work for you to write.  The newer WPF version handles this better in general because it only lexes through what it needs to in order to display text and won't blindly lex the entire document.

You mentioned that 3 regex patterns seemed to be causing major performance issues there.  What were the regex patterns?  Keep in mind that if you use regex code like "foo .* bar" then it has to constantly do a ton of searching ahead (possibly to the end of the document) to see if there are any "bar" occurrences after a "foo".  You can reduce that by doing something like "foo [^\n]* bar".  That's one example that might help if that's a scenario you are running into.  But without knowing the patterns you are using, it's hard to say.  I do suspect that some refactoring of the regex patterns in some form would help though. 

Also keep in mind that the dynamic lexer checks patterns in the order they are defined.  So put the most common ones (and maybe least expensive) ones first in your lexical state.

2) We do lex by document line so extremely long lines can run into performance issues.  If that set (.GOTO .EXIT .QUIT) is a regex pattern set, it could be refactored to be improved.  I'm not sure if it is an explicit or regex pattern group.  It might help if you can email us a language definition XML file and a sample file we can load up into our SDI Editor sample to see what you see and get a better idea on your pattern setup.  Perhaps once you look it over again with the suggestions above.  You can email it to our support address and reference this thread.

3) I'm not seeing much slowdown when I do this in our samples.  But perhaps it's due to the lexer performance issues with your particular pattern definitions that is causing this.


Actipro Software Support

Posted 10 years ago by Michael Dempsey - Sr. Developer, Teradata Corp
Avatar

OK I'll send the info to support.

However the problem patterns were Explicit patters - not RegEx - which is why I can't explain how 3 out of 1000 patterns could possibly cause such a huge performance difference.

Also the RegEx patterns that caused problems were simple patterns like '\.?(aaa|bbb|ccc)' so there would be no long searches.
(an optional period followed by one of 3 words)
Again, no obvious reason why that would have a major impact.

Posted 10 years ago by Michael Dempsey - Sr. Developer, Teradata Corp
Avatar

Any chance that you plan to implement the 'limited range' parsing in WinForms in the near future?

Also, any changes to make parsing more 'local', especially for very long lines, would be appreciated.

Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Mike,

It would require rewriting our entire infrastructure.  This probably won't happen purely in WinForms but we do hope to eventually backport our newer API to WinForms.  It would be nice to have all the platforms (WPF, WinRT, Silverlight, and WinForms) on the same SyntaxEditor API so that languages could seamlessly work between the platforms.  Right now WinForms is the odd man out there.


Actipro Software Support

Posted 10 years ago by Michael Dempsey - Sr. Developer, Teradata Corp
Avatar

OK. I hadn't realized the diffent versions used different APIs.

I had just assumed that you wrote one and then ported to the others.
(I know that's how the other vendors I use handle it... but I guess WinForms is older and you did a re-write for WPF.) 

Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Mike,

Yes we had the WinForms version for years before WPF was created.  Then when WPF came out we rewrote the product based on our past experience, trying to improve everything along the way.  The Silverlight and WinRT versions that were created later share a codebase with our WPF version.


Actipro Software Support

The latest build of this product (v24.1.1) was released 3 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.