Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read UNIX encoded file with C#

I have c# program we use to replace some Values with others, to be used after as parameters. Like 'NAME1' replaced with &1, 'NAME2' with &2, and so on.

The problem is that the data to modify is on a text file encoded on UNIX, and special characters like í, which even on memory, gets read as a square(Invalid char). Due specifications that are out of my control, the file can't be changed and have no other choice than read it like that.

I have tryed to read with most of the 130 Encodings c# offers me with:

EncodingInfo[] info = System.Text.Encoding.GetEncodings();
string text;
for (int a = 0; a < info.Length; ++a)
{
      text = File.ReadAllText(fn, info[a].GetEncoding());
      File.WriteAllText(fn + a, text, info[a].GetEncoding());
}

fn is the file path to read. Have checked all the made files(like 130), no one of them writes properly the í so im out of ideas and im unable to find anything on internet.

SOLUTION:

Looks like finally this code made the work to get the text properly, also, had to fix the same encoder for the Writing part:

System.Text.Encoding encoding = System.Text.Encoding.GetEncodings()[41].GetEncoding();

String text = File.ReadAllText(fn, encoding); // get file text 

// DO ALL THE STUFF I HAD TO

File.WriteAllText(fn, text, encoding) System.Text.Encoding.GetEncodings()[115].GetEncoding();   //Latin 9 (ISO) 

/* ALL THIS ENCODINGS WORKED APARENTLY FOR ME WITH ALL WEIRD CHARS I WAS ABLE TO WRITE :P
    System.Text.Encoding.GetEncodings()[108].GetEncoding(); //Baltic (ISO)
    System.Text.Encoding.GetEncodings()[107].GetEncoding(); //Latin 3 (ISO)
    System.Text.Encoding.GetEncodings()[106].GetEncoding(); //Central European (ISO)
    System.Text.Encoding.GetEncodings()[105].GetEncoding(); //Western European (ISO)
    System.Text.Encoding.GetEncodings()[49].GetEncoding();      //Vietnamese (Windows)
    System.Text.Encoding.GetEncodings()[45].GetEncoding();      //Turkish (Windows)
    System.Text.Encoding.GetEncodings()[41].GetEncoding();      //Central European (Windows)   <-- Used this one 
    */

Thank you very much for your help

Noman(1)

like image 990
Noman_1 Avatar asked Nov 05 '22 00:11

Noman_1


1 Answers

you have to get the proper encoding format. try

use file -i. That will output MIME-type information for the file, which will also include the character-set encoding. I found a man-page for it, too :)

Or try enca

It can guess and even convert between encodings. Just look at the man page.

If you have the proper encoding format, look for a way to apply it to your file reading.

Quotes: How to find encoding of a file in Unix via script(s)

like image 89
sschrass Avatar answered Nov 09 '22 14:11

sschrass