Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arabic presentation forms B support in c#

I was trying to convert a file from utf-8 to Arabic-1265 encoding using the Encoding APIs in C#, but I faced a strange problem that some characters are not converted correctly such as "لا" in the following statement "ﻣﺣﻣد ﺻﻼ ح عادل" it appears as "ﻣﺣﻣد ﺻ? ح عادل". Some of my friends told me that this is because these characters are from the Arabic Presentation Forms B. I create the file using notepad++ and save it as utf-8.

here is the code I use

    StreamReader sr = new StreamReader(@"C:\utf-8.txt", Encoding.UTF8);
    string str = sr.ReadLine();
    StreamWriter sw = new StreamWriter(@"C:\windows-1256.txt", false, Encoding.GetEncoding("windows-1256"));
    sw.Write(str);
    sw.Flush();
    sw.Close();

But, I don't know how to convert the file correctly using this presentation forms in C#.

like image 242
Maged Avatar asked Sep 21 '10 07:09

Maged


1 Answers

Yes, your string contains lots of ligatures that cannot be represented in the 1256 code page. You'll have to decompose the string before writing it. Like this:

  str = str.Normalize(NormalizationForm.FormKD);
  st.Write(str);
like image 144
Hans Passant Avatar answered Sep 28 '22 01:09

Hans Passant