Yes that looks like the right sort of code.
Your other questions are where things get tricky since the tokens don't store that info. You'd have to get an XmlContext at each tag start, which may be expensive timewise if you do it a ton. You'd have to try it to see how much it impacts perf. But also, you'd not want to run that until you had the semantic parsing completed. So instead, you'd want to override OnDocumentSemanticParseDataChanged and cache the text range somehow since it doesn't know the range at that point.