Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can not read turkish characters from text file to string array

I am trying to do some kind of sentence processing in turkish, and I am using text file for database. But I can not read turkish characters from text file, because of that I can not process the data correctly.

string[] Tempdatabase = File.ReadAllLines(@"C:\Users\dialogs.txt");
textBox1.Text = Tempdatabase[5];

Output:

like image 453
Seljuke Avatar asked Apr 05 '16 00:04

Seljuke


3 Answers

It's probably an encoding issue. Try using one of the Turkish code page identifiers.

var Tempdatabase =
    File.ReadAllLines(@"C:\Users\dialogs.txt", Encoding.GetEncoding("iso-8859-9"));
like image 66
Grant Winney Avatar answered Oct 05 '22 22:10

Grant Winney


You can fiddle around using Encoding as much as you like. This might eventually yield the expected result, but bear in mind that this may not work with other files.

Usually, C# processes strings and files using Unicode by default. So unless you really need something else, you should try this instead:

Open your text file in notepad (or any other program) and save it as an UTF-8 file. Then, you should get the expected results without any modifications in your code. This is because C# reads the file using the encoding you saved it with. This is default behavior, which should be preferred.

When you save your text file as UTF-8, then C# will interpret it as such.

This also applies to .html files inside Visual Studio, if you notice that they are displayed incorrectly (parsed with ASCII)

saveas

like image 22
bytecode77 Avatar answered Oct 05 '22 23:10

bytecode77


The file contains the text in a specific Turkish character set, not Unicode. If you don't specify any other behaviour, .net will assume Unicode text when reading text from a text file. You have two possible solutions:

Either change the text file to use Unicode (for example utf8) using an external text editor.

Or specify a specific character set to read for example:

string[] Tempdatabase = File.ReadAllLines(@"C:\Users\dialogs.txt", Encoding.Default);

This will use the local character set of the Windows system.

string[] Tempdatabase = File.ReadAllLines(@"C:\Users\dialogs.txt", Encoding.GetEncoding("Windows-1254");

This will use the Turkish character set defined by Microsoft.

like image 22
NineBerry Avatar answered Oct 05 '22 23:10

NineBerry