Large documents with semantic parser service

SyntaxEditor for Windows Forms Forum

Posted 11 years ago by Justin Harrell
Version: 13.1.0311
Platform: .NET 4.5
Environment: Windows 8 (64-bit)
Avatar

One of our uses for SyntaxEditor is viewing XML documents read only (along with xml editing and code editing). Some of these read only XML document are large. In order to keep the UI responsive while these large non-editable documents are loaded we created a "LoadAsync" method for SyntaxEditor that reads the file in chunks and does Document.AppendText in a loop, this also has a cutoff that stops loading the doc if it exceeds a threshold (currently 10MB) so you can see the beggining of even a large file. This works pretty well but I am running into two related issues.

The first issue is that if Semantic Parsing is enabled on the document while the text is appended we get inconsistent outlining with XML. Many times nodes at the beginning of the doc will not get outlined. The behaivor is somewhat repeatable on the same document and will change if chunk size is modified. I would guess this is a timing issue with the Semantic Parser service. Highlighting and schema validation seem to function correctly.

The second issue is if we disable semantic parsing or outlining on the document before we start to append text then re-enable after all text is loaded we get correct outlining however the call to set outlining or parsing to true blocks for some time, as though the semantic parsing service is not being used and it's happening on the current thread.

So on one hand we get responisive UI with inconsistent outlining or a UI hang at the end of file load with correct outlining. We currently are just allowing the inconsistent outlining. Ideally I would like to disable the parsing/outlining for the duration of load then let the service do it in the background after, but the UI hang is unacceptable.

Not sure how much better the WPF version is at large documents, we would like to move to WPF at some point but the current app uses multiple winform controls from other vendors and we are unsure of those controls will play nice in WPF plus just the dev time of conversion.

Thanks for any suggestion or help, we have found no better control for viewing XML documents even with the issues, plus we use the editing features extensively for smaller docs.

[Modified 11 years ago]

Comments (6)

Posted 11 years ago by Tobias Lingemann - Software Devolpment Engineer, Vector Informatik GmbH
Avatar

The WPF version seems to have considerable improvements on that topic.

The second issue is if we disable semantic parsing or outlining on the document before we start to append text then 
re-enable after all text is loaded we get correct outlining however the call to set outlining or parsing to true blocks 
for some time, as though the semantic parsing service is not being used and it's happening on the current thread.

It seems that you just disable the semantic parsing service but not semantic parsing itself. I would recommend leaving the service running and just enable/disable semantic parsing for your document.

myEditor.Document.SemanticParsingEnabled = false;

However simply creating and showing all collapsable nodes can take some time too and this has to run in GUI thread. Since the document is read-only you could try to add the collapsable nodes step by step.

The first issue is that if Semantic Parsing is enabled on the document while the text is appended we get inconsistent 
outlining with XML. Many times nodes at the beginning of the doc will not get outlined.

The outlining nodes are created based on the text currently available. If there is no matching end tag, there won't be a outlining node. On the other hand you might find the wrong "matching" tag, simply because the right one is not there yet.
I would suggest only to append complete XML sub-trees. This way the outlining for this sub-tree will be complete.

[Modified 11 years ago]


Best regards, Tobias Lingemann.

Posted 11 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Justin,

Silly question but did you start the semantic parser service?  If it is off, everything will be done on the UI thread.

Even when it's on though, the parse results (built in the worker thread) call back to the main UI thread to set them on the Document.SemanticParseData property.  That will trigger an automatic outlining run that occurs in the main UI thread.

The first issue is likely that a block element isn't found (since in the first chunk the root element doesn't have an end tag yet) so it doesn't outline the node.  I would think that later incremental changes would get it updated though.  And as Tobias said, loading valid fragments of XML would probably work better overall.

For the second issue, if you change the outlining mode, it effectively calls Reparse() which causes another complete lex, parse, and outlining phase.  That's why you see the slowdown there.

Your only real options with the very large files is to turn off outlining before load and either turn it back on after your text is fully loaded (which will incur the temporary hang) or leave it off for very large docs.  Sorry but with the WinForms design, I don't think there is a better way.

As for the WPF version, it is definitely much better at handling large documents in general than the WinForms version.  You can load large documents and see the text almost instantly in WPF.  Your documents are very large but I would expect them to perform much better anyhow in our WPF version.


Actipro Software Support

Posted 11 years ago by Justin Harrell
Avatar

It seems that you just disable the semantic parsing service but not semantic parsing itself.

No the semantic parsing service is alway enabled. I call either Document.SemanticParsingEnabled = false; before the appendtext loop and Document.SemanticParsingEnabled = true; after the loop completes. The call to SemanticParsingEnabled = true hangs for many seconds on a large document even with the service enabled.

However simply creating and showing all collapsable nodes can take some time too and this has to run in GUI thread. 

This does not seem to be the case when the document is loaded normally or SemanticParsing is left on while appending text. It seems the semantic parsing service does all parsing and outlining in the background and will display the outlining after it completes with no UI hang. Just load a large xml document normally it will hang while simply displaying the highlighted text (lexing) but will become fully responsive with no outlining. Then after some time the outlining will appear with barely a stutter, this denotes the bulk of the work was done in the background and only the outline drawing was invoked on the main UI thread, and that seems to take very little time.

The outlining nodes are created based on the text currently available. If there is no matching end tag, there won't be a outlining node.

This seems odd, as though you are saying outlining might not work correctly if text is edited or appended quickly enough. Obviously this would not be the intention. I realize you can't have an outline node until the end tag exists, and appendtext may append an unclosed fragement, however a later call to appendtext will write the end tag. Any change to the document should require re-outlining, how would it be possible to have a complete valid xml document where certain tags are not outlined regardless of wether it was loaded all at once or appended in chunks. This seems like a bug to me.

As I said I would like to disable the outlining while the document is being loaded in chunks because I would imagine the service is starting a re-parse/outline on a every call to appendtext then cancelling and re-starting again on the next call wasting cpu time, but doing so is as bad as having the semantic parsing service off.

Silly question but did you start the semantic parser service? If it is off, everything will be done on the UI thread.

Its on I triple checked ;)

Even when it's on though, the parse results (built in the worker thread) call back to the main UI thread to set them on the Document.SemanticParseData property. That will trigger an automatic outlining run that occurs in the main UI thread.

Why would outlining occur on the UI thread? I know the drawing of the outlines must occur on the main thread but Outline.Add could be invoked in a loop on the background thread allowing messages to pump while new outlines are being added/drawn. This doesn't seem to do happen normally, Loading a large XML file in one shot seems to hang on text layout and lexing but then becomes responsive while parsing and outlining is occuring, once the service completes it's background work the outline appear with no perceptable UI hang. So somehow you guys are doing this.

 For the second issue, if you change the outlining mode, it effectively calls Reparse() which causes another complete lex, parse, and outlining phase. That's why you see the slowdown there.

I suspect this is the actual issue. Setting Document.SemanticParsingEnabled or Document.Outlining to true is re-lexing which always occurs on the main thread from my understanding. Isn't everything lexed in appendtext or any document modification, this does not seem necessary why re-lex a document in this case? Outlining and parsing seem to happen in the service, setting them to true should simply wake up the service to do it's job just the same as if they where enabled and some text was modified in the document the lexing is already done.

Sorry but with the WinForms design, I don't think there is a better way.

I really think you guys should look at the incorrect outlining issue, if calling appendtext quickly against a document can result in incorrect outlining (and maybe parsing?) this seems like a bug. What if a user simply paste text to quickly? Also the setters on Document.Outlining.Mode and Document.SemanticParsingEnabled are doing some extra work on the UI thread that does not occur normally, again if a user types a key and modifies the document the service wakes up and parses without hanging the UI, setting those properties to true should essentially do the same thing.

I can put together a sample that shows the issues if that would help.

Posted 11 years ago by Justin Harrell
Avatar

I have put together a sample application to reproduce the behaivor. There does not seem to be any way to upload attachments to the forum so if you guys would like to examine it then let me know the best way to send it.

The sample can load files or there is a button to generate a test 5MB xml sample. I also have buttons for load async and disabling outlining while loading async. The background of the editor changes colors based on state(loading green, hung red).

If outling is disabled while loading when it is re-enabled at the end of the 5MB load the application hangs for about 12 seconds on my dev machine. About 9 seconds after the 12 seconds hang the editor will hang again for just a second and the outlining will appear. If outlining is enabled during the load the same 9 second go by after loading completes and then outlines appear. So basically by disabling and then reenabling outlining adds a 12 second hang while the end result is the same (except for the random incorrect outlining).

One detail that I noticed is the the incorrect outlining is much easier to reproduce if the documents is scrolled and or the cursor is moved while it is loading. So if you hit load and do nothing else while it loads it more often than not correctly outlines, if however as it's loading you scroll down and click to move the cursor to a random position during the 9 seconds while the service is parsing it will almost always do incomplete outlining.

Posted 11 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Justin,

Please send the sample (with renamed .zip file extension) to our support address and reference this thread.  Thanks!


Actipro Software Support

Posted 11 years ago by Justin Harrell
Avatar

Email sent, thanks for taking a look.

The latest build of this product (v24.1.1) was released 3 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.