How to set encoding of newly created document

Comments (8)

Posted 16 years ago by Actipro Software Support - Cleveland, OH, USA

Hi James,

You are correct, when a file loads it loads into standard Unicode strings since that is what .NET uses. When we display text, all we do is take regular .NET System.String objects and render them using this GDI API:

[DllImport("User32.dll", CharSet=CharSet.Auto)]
private static extern void DrawText(IntPtr hdc, string lpStr, int nCount, RECT lpRect, int wFormat);

There isn't any encoding that can be set on System.String.

Actipro Software Support

Posted 16 years ago by James Deadman

Hi,

Thanks very much for your reply and help.

Sorry im not quite understanding what you meant by your answer. When i load a file i can specify the encoding as a parameter, then that file stays as that encoding so when the file is saved it will be saved as the correct encoding.

What im hoping for is to be able to specify the encoding for a newly created document, just like i can with loading a file.

So that if the user creates a new empty document, i would like to set the encoding or code-page for it. Is there a way to set the encoding in a SyntaxEditor document manually, without needing to call LoadFile()? (since i wouldnt be calling LoadFile() as its only a new blank document that i require).

Thank you.

Regards,
James

Posted 16 years ago by Actipro Software Support - Cleveland, OH, USA

When a document gets loaded, it uses the Encoding you specify and .NET file loading functionality in System.IO converts it to a standard Unicode .NET string. After the initial file load, no encoding information is stored.

As an example, here is our LoadFile code:

StreamReader reader = new StreamReader(stream, encoding);
string text = reader.ReadToEnd();
this.Text = text;

StreamReader converts whatever encoding the file is in, back into standard Unicode, which gets set in our Document.Text property.

I don't believe the document persists the Encoding you used at load time either. You have to save it back out specifying an Encoding or else it will use UTF-8 if you don't.

Actipro Software Support

Posted 16 years ago by James Deadman

Thanks for your desciption, it was very helpful. I understand what you mean about the loading and storing.

Quote:
I don't believe the document persists the Encoding you used at load time either. You have to save it back out specifying an Encoding or else it will use UTF-8 if you don't.

So does this mean that once a file is open (or indeed a new file created), that if a user attempts to type characters into the editor which are non-unicode (e.g. Japanese shift-jis which is ansi based) that the characters will not appear correctly since the text stored internally in the SyntaxEditor is unicode based? (sorry im not too familiar with this area).

Thanks again,
James

Posted 16 years ago by Actipro Software Support - Cleveland, OH, USA

Hi James,

I looked up Shift JIS and it is a double-byte character encoding, meaning it takes two input characters to display a single character. Input with double-byte characters may not work with the control at this time. So I don't think it has anything to do with it being ANSI-based, it's more due to it being double-byte character based.

Actipro Software Support

Posted 16 years ago by James Deadman

Hello,

Sorry for the delay in responding. I asked by Japanese colleagues to double-check the issue. It seems that they can type in double-byte characters (for example Shift-JIS) but they had trouble opening files of that type (only because we have not specified the encoding type on the LoadFile() function). Therefore my apologies but i think my Japanese colleagues did not quite understand what i was asking them.

May i ask if you know of a good method to detect the encoding of a file when it is opened please?

Although i can pass the encoding-type parameter to the LoadFile() function, i do not necessarily know the encoding beforehand. There only seem to be a few file-encodings that are easy to detect using a BOM (e.g. UTF-7, UTF-8). Other ones seem to be more difficult or don't have a BOM. Any help is appreciated.

Thanks,
James

Posted 16 years ago by Actipro Software Support - Cleveland, OH, USA

Hi James,

Oh good, I'm glad to hear it was just the encoding issue. I thought for sure we had Japanese customers using the product ok. :)

We don't have any notes here on encoding detection. It would probably be best to google it and see what sample code you can find on the web to do such things.

Actipro Software Support

Posted 16 years ago by James Deadman

No problem, i will look on google for that.

Thanks again for your help!

Regards,
James

The latest build of this product (v25.1.0) was released 29 days ago, which was after the last post in this thread.

Comments (8)

Add Comment