How to set encoding of newly created document

SyntaxEditor for Windows Forms Forum

Posted 10 years ago by James Deadman
Avatar
Hello,

I would like to ask how to set the encoding (e.g. utf-8, shift-jis, etc) of a newly created SyntaxEditor document. How might i do this please?

I can see how its done when opening a file, by passing the encoding to the LoadFile() function.

Initially i thought that perhaps a new SyntaxEditor document will just allow any types of character encoding to be entered, but my Japanese colleagues tell me that this doesnt work. For example they created a new document and tried typing Japanese Shift-JIS (Ansi based) text but the characters do not appear correctly. I think it must default to UTF-8.

Its important for us to allow our Japanese customers to be able to type characters of a particular encoding for newly created files.

Any help would be appreciated. Thanks!

Regards,
James

Comments (8)

Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi James,

You are correct, when a file loads it loads into standard Unicode strings since that is what .NET uses. When we display text, all we do is take regular .NET System.String objects and render them using this GDI API:
[DllImport("User32.dll", CharSet=CharSet.Auto)]
private static extern void DrawText(IntPtr hdc, string lpStr, int nCount, RECT lpRect, int wFormat);
There isn't any encoding that can be set on System.String.


Actipro Software Support

Posted 10 years ago by James Deadman
Avatar
Hi,

Thanks very much for your reply and help.

Sorry im not quite understanding what you meant by your answer. When i load a file i can specify the encoding as a parameter, then that file stays as that encoding so when the file is saved it will be saved as the correct encoding.

What im hoping for is to be able to specify the encoding for a newly created document, just like i can with loading a file.

So that if the user creates a new empty document, i would like to set the encoding or code-page for it. Is there a way to set the encoding in a SyntaxEditor document manually, without needing to call LoadFile()? (since i wouldnt be calling LoadFile() as its only a new blank document that i require).

Thank you.

Regards,
James
Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
When a document gets loaded, it uses the Encoding you specify and .NET file loading functionality in System.IO converts it to a standard Unicode .NET string. After the initial file load, no encoding information is stored.

As an example, here is our LoadFile code:
StreamReader reader = new StreamReader(stream, encoding);
string text = reader.ReadToEnd();
this.Text = text;
StreamReader converts whatever encoding the file is in, back into standard Unicode, which gets set in our Document.Text property.

I don't believe the document persists the Encoding you used at load time either. You have to save it back out specifying an Encoding or else it will use UTF-8 if you don't.


Actipro Software Support

Posted 10 years ago by James Deadman
Avatar
Thanks for your desciption, it was very helpful. I understand what you mean about the loading and storing.

Quote:
I don't believe the document persists the Encoding you used at load time either. You have to save it back out specifying an Encoding or else it will use UTF-8 if you don't.


So does this mean that once a file is open (or indeed a new file created), that if a user attempts to type characters into the editor which are non-unicode (e.g. Japanese shift-jis which is ansi based) that the characters will not appear correctly since the text stored internally in the SyntaxEditor is unicode based? (sorry im not too familiar with this area).

Thanks again,
James
Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi James,

I looked up Shift JIS and it is a double-byte character encoding, meaning it takes two input characters to display a single character. Input with double-byte characters may not work with the control at this time. So I don't think it has anything to do with it being ANSI-based, it's more due to it being double-byte character based.


Actipro Software Support

Posted 10 years ago by James Deadman
Avatar
Hello,

Sorry for the delay in responding. I asked by Japanese colleagues to double-check the issue. It seems that they can type in double-byte characters (for example Shift-JIS) but they had trouble opening files of that type (only because we have not specified the encoding type on the LoadFile() function). Therefore my apologies but i think my Japanese colleagues did not quite understand what i was asking them.

May i ask if you know of a good method to detect the encoding of a file when it is opened please?

Although i can pass the encoding-type parameter to the LoadFile() function, i do not necessarily know the encoding beforehand. There only seem to be a few file-encodings that are easy to detect using a BOM (e.g. UTF-7, UTF-8). Other ones seem to be more difficult or don't have a BOM. Any help is appreciated.

Thanks,
James
Posted 10 years ago by Actipro Software Support - Cleveland, OH, USA
Avatar
Hi James,

Oh good, I'm glad to hear it was just the encoding issue. I thought for sure we had Japanese customers using the product ok. :)

We don't have any notes here on encoding detection. It would probably be best to google it and see what sample code you can find on the web to do such things.


Actipro Software Support

Posted 10 years ago by James Deadman
Avatar
No problem, i will look on google for that.

Thanks again for your help!

Regards,
James
The latest build of this product (v2018.1 build 0341) was released 3 months ago, which was after the last post in this thread.

Add Comment

Please log in to a validated account to post comments.