Fixing Syntax editor for big files

SyntaxEditor for Windows Forms Forum

Posted 10 years ago by Bob Rundle - Director, Dynamic Workflow, JOA Oil & Gas BV
Avatar
I am fixing the SyntaxEditor so that it will effectively load and edit large files. I need to be able to view and edit text files that are well in excess of 1MB (some as large as 100MB). The current design of the syntax editor simply does not support this.

Try creating a simple program that loads a 1MB file and then try to edit it. It will be pitifully slow.

The problem is very basic. When the syntax editor opens a file as a stream it does a ReadToEnd() to load the entire file into a string. Then it puts this string into another buffer. If you insert a character at the beginning, all 1MB characters must be moved down one place. Each time you type a character it does this.

Worse, all the line start character positions are updated for each character inserted.

So I purchased the blueprint version of the syntax editor and I am fixing it. I am putting in a proper multi-threaded lazy load string buffer that will allow you to open a large file and see it immediately. More importantly you will be able to add characters to the string buffer without moving every character down one.

I am just getting started. If anyone has already done this and can help me out I would be very appreciative.

Regards,
Bob Rundle
rundle@rundle.com

Comments (6)

Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi Bob,

Well there are two elements to the slowness of large files. One is the text storage mechanism, although since we use a StringBuilder (which is optimized for many updates), so that itself doesn't have too much of an effect on speed. The bigger impact for typing is caused by how tokens are stored. Right now tokens for the entire document are stored in a collection so when you add/remove tokens (due to lexical parsing) that can take time and each token after the changed tokens needs their start offsets updated. For large documents that can turn into a processing bottleneck.

The good news is that we are currently developing our next generation text/parsing framework and are prototyping it out with the WPF version of SyntaxEditor that was just released yesterday. We've redesigned the internals from scratch such that it handles large files extremely well. For instance we can open a 10MB C# document and see syntax highlighting about instantly. There is no real noticable slowdown in typing speed either. Improved large file handling was one of the major goals of the new design.

Our plan is that once our new framework has all the features found in WinForms, we want to migrate it over to the WinForms product. We wrote our text/parsing library in .NET 2.0 so it would be compatible with WinForms, ASP.NET, etc. But we still have a lot of work to do first, and have to add support for outlining, semantic parsing, etc.


Actipro Software Support

Posted 10 years ago by Bob Rundle - Director, Dynamic Workflow, JOA Oil & Gas BV
Avatar
Well it is good news that you are finally going to do something about big files. However I wrote you folks on Aug 19, 2006 complaining about the performance of the SyntaxEditor and you wrote back that "This is all being addressed in v4.0". Well it wasn't. So I simply have to assume that your claims of performance improvements in the WPF SyntaxEditor are more smoke and mirrors.

As far as the StringBuilder...you are dead wrong here. StringBuilder handles character insertions very poorly...it simply moves all the characters down in the buffer one by one for each character that is inserted. Simply debug the IL to figure this out for yourself.
Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Bob,

v4.0 did make numerous improvements in speed over v3.1 and added the ability to turn off various editor features for improved performance, although as a whole v4.0 still stored text and tokens similar to v3.1. So even with the changes we made there, the performance for large files ended up not being where we wanted it and definitely still needed work.

Thus over the past year or more we've been thinking of every way we can possibly design various features to use less over memory and make use of virtualization techniques. We've spent many weeks of development time for the next gen framework specifically on these design goals. Some of the areas that have been touched are: custom storage mechanism for text that is fast and does not use StringBuilder, complete virtualization of tokens (uses much less memory and returns tokens on demand), we only lexically parse through what we need for display purposes in the editor (not parsing the entire doc by default), word wrap no longer uses extra memory, and a bunch of other smaller enhancements towards these goals. Although we're still tweaking it, you can see the performance difference by downloading the WPF version. Files that would take 10-15 or more seconds to load in SE4 load practically instantly in the WPF version and there is no bad typing slowdown in large files like there is in SE4. As mentioned in the past, once we get the next generation framework to a point where it has all the features that WinForms does, we hope to port it back so that the WinForms version can use it as well.


Actipro Software Support

Posted 10 years ago by Bob Rundle - Director, Dynamic Workflow, JOA Oil & Gas BV
Avatar
Indeed I have downloaded the WPF version of the SyntaxEditor, wrote a test program and was able to load and edit a 100MB plain text file...it opens very fast and editing is very responsive. The WPF version of the SyntaxEditor is indeed a huge improvement. The WinForms version of the SyntaxEditor will not load the 100MB file at all.

However this is not good enough. If I look at the memory usage, The WPF test problem is using 690MB. UltraEdit and TextPad (the gold standards of big text file editing) use only 11MB for the same file.

So when you say that "we've been thinking of every way we possibly can", I am startled by the poverty of your imagination. How you suppose that UltraEdit and TextPad only use 11MB of memory for a 100MB file? I'd try to explain if it would make a difference.

You must wonder why I don't simply use UltraEdit or TextPad. Well...I would if they were deployed as controls I could integrate into my product...but they are not. Why don't I use some other .NET text editor control designed for big files and stop hassling you guys? Well...there aren't any. Sad to say but the SyntaxEditor seems to be the best there is out there.

So I return now to my work of redesigning the SyntaxEditor so that it will work on big files.

Regards,
Bob Rundle
Posted 10 years ago by Thushan Fernando
Avatar
This is a bottleneck for us too, our users seem to load 10-100Mb SQL documents that kill the editor right now. Unfortunately there's very little we can do but wait - we didn't buy the blue-print edition (DOH!).

So we're banking heavily on the 5.x upgrade will bring about some similar performance gains to the WPF edition - minus baggage. We're waiting for the results to trickle through. Hopefully before the end of our subscription :)

Bob, let us know just how much work was involved (hours wise) and whether its worth it. I have a feeling I might have to upgrade to BP and MC Hammer out a way.
Posted 10 years ago by Bob Rundle - Director, Dynamic Workflow, JOA Oil & Gas BV
Avatar
The 100 MB text file is critical for us. We have known for a long time that the SyntaxEditor has problems with large text files. A 10MB text file will work to a certain degree but is really unusable. We were hoping that files >10MB would be a rare thing for our users...but it turns out that most of the text files they use are >10MB. So we are in a real bind.

Even the WPF control, from what I can see, is not that good. For some reason these guys think that they need to have the entire file in memory to do anything. This is really unnecessary. I know they will say you have to do this for proper language parsing and outlining, etc...but these are excuses. I wish excuses could be my core competency...it would make the job a lot easier.

Here is what I have done: I created a set of lazy stream classes that fault in sections of the text file as you access them. It has an MRU chain that keeps only the most recently accessed blocks of the file in memory.

Now I need to figure out how to bolt the lazy stream onto the SyntaxEditor. This might be impossible. The first thing the stupid thing wants to do is to replace all the <cr><lf> with <lf> in the entire file! So the entire file is modified from the beginning. The lazy stream cannot work like this...so I will have to teach the syntax editor to edit the file without changing the line terminators...might be undoable.

I have even been considering writing a text control from scratch. I have a prototype working that moves through a 100MB file lightning fast...loads in seconds, uses only 20MB of memory. However there is still a huge amount of code to be written...selection, copy/paste, etc, etc. So I am wondering if this is the right approach.

So I really do not know how many hours it will take to fix the SyntaxEditor...might be a fool's errand to take a Tinker Toy control and make it industrial strength.

I am still working on it...will report back later.

Regards,
Bob Rundle
The latest build of this product (v2020.1 build 0400) was released 4 days ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.