Now that my app is rolling out to customers I am starting to get reports of rather odd performance issues that were never seen with the old editor we used.
What makes them odd is that a very small change to th language definition file [these are XML defined languages] will make a huge difference to performance in one case but not in the others.
The 3 issues are:
#1 - Very large files take a long time to load/parse. (eg. 500KB)
#2 - A relatively small file [20K] that contains no carriage returns (ie. all on one line) takes a long time to load/parse, and then typing is extremely slow.
(A 20K file with CRs every 20 to 100 characters - as is more normal - has no issues at all.)
#3 - Typing into a 'block' selection of zero width spanning 50 lines is OK for the first couple of characters but then gets slowere and slower with every extra character typed.
I accept that large files may take a long time to load/parse since the language is complex.
(5 child states, 74 Pattern Groups and over 1000 Patterns. Almost all Explicit though.)
but when I replaced just 3 simple RegEx patterns with their equivalent 22 Explicit patterns issue #1 improved by 300%.
(All patterns were defining the same TokenID) but there was no noticable improvement in issue #2 or #3.
Such a small number of patterns changed should not have such a huge impact on performance - and definately not only impacting one of the issues.
An even bigger suprise was when I removed 3 tokens (out of over 1000) from one of the Explicit pattern groups issue #2 improved by 800%!
Again this change had no noticable effect on issue #1 or #3.
There seems to be no logic to what causes performance issues.
Note that BOTH performance improvemants were made to the same TokenID, and this is the only set of tokens that uses a LookBehind pattern that starts with '^' - to match the start of the line. (This is the only thing I can see that makes these tokens different from all the others.)
The 800% improvement occurred when I removed 3 tokens which each started with a period from a set of 6 tokens - the others did not start with a period.
The set was: GOTO EXIT QUIT .GOTO .EXIT .QUIT
I suspect that the issue with the very long lines is that you reparse the entire current line +/- 1 or 2 lines every time the text changes. So in this situation you reparse the entire text after every keystroke.
I think the old editor I used only parsed 2 or 3 tokens =/- the current position on key strokes. It then parsed the current line +/- 1 or 2 when the cursor moved to a new line. (Things like AutoCasing were only done after the line number changed)
The result was that long lines made no difference at all to performance.
I have not found anything that helps issue 3 but it must be related to something in the language definition as I don't see the issue when I use the much simpler SQL language from your sample app. (It does still apply to the 'limited' language I switch to when the text exceeds 200KB. This has far fewer tokens defined than the full language but still more than your demo file.)
Do you have a set of recommended DOs and DONTs for language definitions?