Web languages add-on: HTML5 compatibility

SyntaxEditor for WPF Forum

Posted 4 years ago by Andrew Levine
Version: 19.1.0686
Avatar

I'm having some trouble finding an XSD on the web for HTML5. The forum posts on Stack Overflow say that HTML5 isn't based on SGML, so officially it can't be done. Is there any way to get correct error lists, etc that you've found?

Comments (8)

Posted 4 years ago by Andrew Levine
Avatar

I've found so-called RELAX NG schemas used in the W3C's Nu validator, and supposedly they can be converted to xsd files. Is there a more straightforward way though? ps. Sorry for posting in the wrong forum section!

https://github.com/validator/validator/tree/20.6.30/schema/html5

LATER REVISION: these are for XHTML5, which isn't the same as HTML5.

[Modified 4 years ago]

Answer - Posted 4 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Andrew,

That is correct.  XSDs need the content to be XML-based, which means proper nested tags and self-closing tags (e.g. <img/>, not <img>).  HTML5 promotes the use of things like <br> (not self-closing), so XSDs won't typically work for HTML5.  We don't have experience with the schemas you mentioned, so I'm not sure about those.

Your best bet is to find some sort of validator that you can call into and build an IParser for your syntax language that executes that validator as part of the parsing process.  Then return the validation results as parser errors.  That way you can still execute the validation asynchronously via our ambient parse request dispatcher so it's not blocking the UI thread.


Actipro Software Support

Posted 4 years ago by Andrew Levine
Avatar

Thanks, I may end up having to try that! Can I make a feature request to put HTML5 in the web languages add-on someday?

Posted 4 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Hi Andrew,

Ok, we'll log your request for a HTML5 syntax language.


Actipro Software Support

Posted 4 years ago by Andrew Levine
Avatar

For now, would the Web add-on blueprint license's XML language definition be a good place to start from in building the parser? I'm familiar with parsing concepts but haven't had to use them for a while, so let me know whether it would be easier to tweak into shape using the language editor so that it recognizes HTML5, or just start from scratch. 

Posted 4 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar

Well the XML parser we have is all based on supporting balanced tag blocks.  HTML5 has all kinds of special rules and ignores proper structure so I'm not sure if starting with our add-on's codebase would really help or not.  You might be best off finding a third party parser on the web that you could just call from a syntax language IParser implementation and get their results.  Then you leave the logic up to people who have already written a parser to handle HTML5's quirks.


Actipro Software Support

Posted 4 years ago by Andrew Levine
Avatar

Thanks again!

Posted 4 years ago by Andrew Levine
Avatar

Follow up: It turns out there are some HTML parser libraries you can use in .NET projects: AngleSharp is the easiest, and rehype-parse (though it needs Edge.js to work) gives you an abstract syntax tree. However there's a difference between an HTML parser and a full HTML validator, or so it seems, because these parse-only interpreters don't always give a good read-out after introducing errors into the HTML being edited. I'm getting better results so far with LibTidy (https://www.html-tidy.org/).

The latest build of this product (v24.1.2) was released 3 days ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.