How to switch end-of-line characters to CRLF?

Comments (6)

Posted 4 years ago by Actipro Software Support - Cleveland, OH, USA

Hello,

Line terminators are tracked internally in SyntaxEditor only by LF characters for simplicity, consistency, and parsing-speed purposes. Then the main methods that export text default to returning the text in CRLF line terminators.

If you know your source line/char, you can use our PositionToOffset methods to convert that to an offset that is correct for SyntaxEditor.

Otherwise, you would somehow need to coerce your offsets based on the line they fall on to match our offsets. For instance, "youroffset - yourlineindex = syntaxeditoroffset".

Actipro Software Support

Posted 4 years ago by Sunshine - Appeon

Assume that the data returned by the analyzer is based on the offset. And its default end-of-line character is CRLF.

The way of "myoffset-mylineindex=syntax editoroffset" is incorrect.

Since the end-of-line characters of the analyzer and SyntaxEditor are inconsistent, the number of lines represented by the same offset in the analyzer and SyntaxEditor may also be inconsistent.

Posted 4 years ago by Actipro Software Support - Cleveland, OH, USA

The original comment and sample formula are based on the assumption that the source document always ends in CRLF and SyntaxEditor only has LF. The idea is that for every line after the first, SyntaxEditor will be off by one character due to the missing CR. This crude example illustrates the concept, where each number represents the offset at that position of the document.

Original
01      <-- CRLF (Offsets 2 & 3, Line 0)
45      <-- CRLF (Offsets 6 & 7, Line 1)
89      <--      (               Line 2)

SyntaxEditor
01      <-- LF (Offset 2, Line 0)
34      <-- LF (Offset 5, Line 1)
67      <--    (          Line 2)

YourOffset - YourLineIndex = SyntaxEditorOffset
1          - 0             = 1
4          - 1             = 3
8          - 2             = 6
9          - 2             = 7

I included a few examples of where a offset in the original was converted to SyntaxEditor based on the original offset and line index. As mentioned earlier, this assumes every line ends with CRLF. If there is any variation in line endings, you would not be able to use such a simple formula.

If line endings are inconsistent, you'd have to maintain your own mapping of the start offset for each line in the original compared to the start offset of each line reported by SyntaxEditor. Then you'd be able to use the relative character position on a line to translate the document offset between the two document sources.

Actipro Software Support

Posted 4 years ago by Sunshine - Appeon

Thank you very much for your careful reply.

I have considered this solution before, but for large documents, the mapping between the original line and the SyntaxEditor line will become a performance penalty point.

And all the data about the offset of the request and the response of the source document needs to be transformed by mapping.This will make the code complex and error-prone.

However, the end-of-line characters in the documents of each platform are inconsistent. We have to support this feature.

I think this is a very important and common problem in the code editor. Although the solution you provide can ensure the correct behavior of the editor, performance will be greatly affected.

Thanks again for your answers.

Answer - Posted 4 years ago by Actipro Software Support - Cleveland, OH, USA

Unfortunately our lexer and parsing framework and other text scanners are all built around relying on a single LF character for line terminators from our document model, so that isn't likely to change.

Another option that would be easiest and still relatively fast for you is you do a simple string replace on the text you send to your analyzer, like this:

text = text.Replace("\r\n", "\n").Replace('\r', '\n');

That would normalize to LF and then all the offsets would be in sync with ours.

Actipro Software Support

Posted 4 years ago by Sunshine - Appeon

I am currently using this method. If you need to make everything correct, you must define the analyzer's default reading method. Thank you for your reply, thank you!

The latest build of this product (v25.1.0) was released 2 months ago, which was after the last post in this thread.

Comments (6)

Add Comment