Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conversion from UTF8 to ASCII

I have a text read from a XML file stored in UTF8 encoding. C# reads it perfectly, I checked with the debugger, but when I try to convert it to ASCII to save it in another file I get a ? char in places where there was a conflicting character. For instance, this text:

string s = "La introducción masiva de las nuevas tecnologías de la información";

Will be saved as

"La introducci?n masiva de las nuevas tecnolog?as de la informaci?n"

I cannot just replace them for their latin (a, e, i, o, u) vowels because some words in spanish would miss the sense. I've already tried this and this questions with no sucess. So Im hoping someone can help me. The selected answer in the second one didnt even compiled...!

In case someone wants to take a look, my code is this one:

private void WriteInput( string input )
{
   byte[] byteArray = Encoding.UTF8.GetBytes(input);
   byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray);
   string finalString = Encoding.ASCII.GetString(asciiArray);

   string inputFile = _idFile + ".in";
   var batchWriter = new StreamWriter(inputFile, false, Encoding.ASCII);
   batchWriter.Write(finalString);
   batchWriter.Close();
}
like image 602
David Conde Avatar asked Dec 04 '10 06:12

David Conde


2 Answers

Those characters have no mapping in ASCII. Review an ASCII table, like Wikipedia's, to verify this. You might be interested in the Windows 1252 encoding, or "extended ASCII", as it's sometimes called, which has code points for many accented characters, Spanish included.

var input = "La introducción masiva de las nuevas tecnologías de la información";
var utf8bytes = Encoding.UTF8.GetBytes(input);
var win1252Bytes = Encoding.Convert(
                Encoding.UTF8, Encoding.GetEncoding("windows-1252"), utf8bytes);
File.WriteAllBytes(@"foo.txt", win1252Bytes);
like image 73
Michael Petrotta Avatar answered Sep 21 '22 06:09

Michael Petrotta


Can't be done. ASCII does not have those letters, so the best you can do is to URL-encode or unicode-escape-encode them.

like image 34
Ignacio Vazquez-Abrams Avatar answered Sep 25 '22 06:09

Ignacio Vazquez-Abrams