I am trying to convert an old application that has some strings stored in the database as ASCII.
For example, the string: ƒ`ƒƒƒlƒ‹ƒp[ƒgƒi[‚Ì‘I‘ð is stored in the database.
Now, if I copy that string in a text editor and save it as ASCII and then open the file in a web browser and set it to automatically detect the Encoding, I get the correct string in japanese: チャネルパートナーの選択, and the page says that the detected encoding is Japanese (Shift_JIS).
When I try to do the conversion in the C# code doing something like this:
var asciiBytes = Encoding.ASCII.GetBytes(text);
var japaneseEncoding = Encoding.GetEncoding(932);
var convertedBytes = Encoding.Convert(japaneseEncoding, Encoding.ASCII, asciiBytes);
var japaneseString = japaneseEncoding.GetString(convertedBytes);
I get ?`???l???p?[?g?i?[???I?? as the japanese String and thus I cannot show it on the webpage.
Any light would be appreciated.
Thanks
some strings stored in the database as ASCII
It isn't ASCII, about none of the characters in ƒ`ƒƒƒlƒ‹ƒp[ƒgƒi[‚Ì‘I‘ð are ASCII. Encoding.ASCII.GetBytes(text) is going to produce a lot of huh? characters, that's why you got all those question marks.
The core issue is that the bytes in the dbase column were read with the wrong encoding. You used code page 1252:
var badstringFromDatabase = "ƒ`ƒƒƒlƒ‹ƒp[ƒgƒi[‚Ì‘I‘ð";
var hopefullyRecovered = Encoding.GetEncoding(1252).GetBytes(badstringFromDatabase);
var oughtToBeJapanese = Encoding.GetEncoding(932).GetString(hopefullyRecovered);
Which produces "チャネルパートナーの選択"
This is not going to be completely reliable, code page 1252 has a few unassigned codes that are used in 932. You'll end up with a garbled string from which you cannot recover the original byte value anymore. You'll need to focus on getting the data provider to use the correct encoding.
As per the other answer, I'm pretty sure you're using ANSI/Default encoding not ASCII.
The following examples seem to get you what you're after.
var japaneseEncoding = Encoding.GetEncoding(932);
// From file bytes
var fileBytes = File.ReadAllBytes(@"C:\temp\test.html");
var japaneseTextFromFile = japaneseEncoding.GetString(fileBytes);
japaneseTextFromFile.Dump();
// From string bytes
var textString = "ƒ`ƒƒƒlƒ‹ƒp[ƒgƒi[‚Ì‘I‘ð";
var textBytes = Encoding.Default.GetBytes(textString);
var japaneseTextFromString = japaneseEncoding.GetString(textBytes);
japaneseTextFromString.Dump();
Interestingly I think I need to read up on Encoding.Convert
as it did not produce the behaviour I expected. The GetString
methods seem to only work if I pass in bytes read in the Encoding.Default
format - if I convert to the Japanese encoding beforehand they do not work as expected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With