[STAThread] static void Main(string[] args) { var fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue; // read the CSV var dataobject = System.Windows.Forms.Clipboard.GetDataObject(); var stream = (System.IO.Stream)dataobject.GetData(fmt_csv); var enc = new System.Text.UTF8Encoding(); var reader = new System.IO.StreamReader(stream,enc); string data_csv = reader.ReadToEnd(); // read the unicode string string data_string = System.Windows.Forms.Clipboard.GetText(); }
After looking at the comments, and paying close attention to what Excel was putting on the clipboard for CSV, it seemed reasonable that Excel might be placing the contents using an "legacy" encoding instead of UTF-8. So I tried the using the Windows 1252 codepage as the encoding and it worked. See the code below
[STAThread] static void Main(string[] args) { var fmt_csv = System.Windows.Forms.DataFormats.CommaSeparatedValue; //read the CSV var dataobject = System.Windows.Forms.Clipboard.GetDataObject(); var stream = (System.IO.Stream)dataobject.GetData(fmt_csv); var enc = System.Text.Encoding.GetEncoding(1252); var reader = new System.IO.StreamReader(stream,enc); string data_csv= reader.ReadToEnd(); //read the Unicode String string data_string = System.Windows.Forms.Clipboard.GetText(); }
Excel stores the string on the clipboard using the Unicode character encoding. The reason you get a square when you try to read the string in ANSI is that there is no representation for that character in your system's ANSI codepage. You should just use Unicode. If you're going to be dealing with localization issues, then ANSI is just more trouble than it's worth.
Edit: Joel Spolsky wrote an excellent introduction to character encodings, which is definitely worth checking out: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Your encoding of the stream as UTF8 is not working. The bytes for the umlaut are being converted into the "replacement character" unicode character.
Instead, just look at the stream's data without any extra encoding instructions. The data will be in some set format used by Excel. You should be able to tell by looking at the byte(s) where the unlaut is. You should then be able to convert it to UTF-8.
Worst case is if the CSV Formatter throws out everything that is not Ascii. In that case, you might be able to write your own Data formatter.
In some cases, the Excel folks have figured that CSV means Ascii only. See http://www.tech-archive.net/Archive/Excel/microsoft.public.excel.misc/2008-07/msg02270.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With