I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface.
Due to incorrect encoding, a piece of my string looks like this in Spanish:
Acción
whereas it should look like this:
Acción
According to the answer on this question: How to know string encoding in C#, the encoding I am receiving should be coming on UTF-8 already, but it is read on Encoding.Default (probably ANSI?).
I am trying to transform this string into real UTF-8, but one of the problems is that I can only see a subset of the Encoding class (UTF8 and Unicode properties only), probably because I'm limited to the windows surface API.
I have tried some snippets I've found on the internet, but none of them have proved successful so far for eastern languages (i.e. korean). One example is as follows:
var utf8 = Encoding.UTF8; byte[] utfBytes = utf8.GetBytes(myString); myString= utf8.GetString(utfBytes, 0, utfBytes.Length);
I also tried extracting the string into a byte array and then using UTF8.GetString:
byte[] myByteArray = new byte[myString.Length]; for (int ix = 0; ix < myString.Length; ++ix) { char ch = myString[ix]; myByteArray[ix] = (byte) ch; } myString = Encoding.UTF8.GetString(myByteArray, 0, myString.Length);
Do you guys have any other ideas that I could try?
In order to convert a String into UTF-8, we use the getBytes() method in Java. The getBytes() method encodes a String into a sequence of bytes and returns a byte array. where charsetName is the specific charset by which the String is encoded into an array of bytes.
Strings are immutable in Java, which means we cannot change a String character encoding. To achieve what we want, we need to copy the bytes of the String and then create a new one with the desired encoding.
UTF-8 encodes a character into a binary string of one, two, three, or four bytes. UTF-16 encodes a Unicode character into a string of either two or four bytes. This distinction is evident from their names. In UTF-8, the smallest binary representation of a character is one byte, or eight bits.
UTF-8 actually works quite well in std::string . Most operations work out of the box because the UTF-8 encoding is self-synchronizing and backward compatible with ASCII.
As you know the string is coming in as Encoding.Default
you could simply use:
byte[] bytes = Encoding.Default.GetBytes(myString); myString = Encoding.UTF8.GetString(bytes);
Another thing you may have to remember: If you are using Console.WriteLine to output some strings, then you should also write Console.OutputEncoding = System.Text.Encoding.UTF8;
!!! Or all utf8 strings will be outputed as gbk...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With