We have a requirement to transliterate Arabic text to Latin characters(without diacritical marks) and display them to users.
We are currently using IBM ICU4j for this. The API doesn't trasliterate well the Arabic text into proper readable latin characters. Refer the below examples:
Example
Arabic text :
صدام حسين التكريتي
Google's transliteration output
: Sadaam Hussein al-tikriti
ICU4J's transliteration outuput
: ṣdạm ḥsyn ạltkryty
How can we improve the transliterated output of ICU4j library?
ICU4J gives us an option to write our own rules but we are currently stuck as no one from our team knows Arabic and are unable to find any proper standard that can be followed.
It's took 4 hours me to research out any other source to tackle out this problem.Later i tried ICU4J and find the solution for your problem .You can run the code and see the point which you was missing.
package com.webom.crypt;
import org.apache.commons.lang3.StringEscapeUtils;
import com.ibm.icu.text.Transliterator;
public class Test {
public static String ARABIC_TO_LATIN = "Arabic-Latin";
public static String ARABIC_TO_LATIN_NO_ACCENTS = "Arabic-Latin; nfd; [:nonspacing mark:] remove; nfc";
public static void main(String[] args) {
String ARABICString = "صدام حسين التكريتي";
String unicodeCodes = StringEscapeUtils.escapeJava(ARABICString);
System.out.println("Unicode codes:" + unicodeCodes);
///YOUR WAY
Transliterator ARABICToLatinTrans = Transliterator.getInstance(ARABIC_TO_LATIN);
String result1 = ARABICToLatinTrans.transliterate(ARABICString);
System.out.println("ARABIC to Latin:" + result1);
//MINE WAY
Transliterator ARABICToLatinNoAccentsTrans = Transliterator.getInstance(ARABIC_TO_LATIN_NO_ACCENTS);
String result2 = ARABICToLatinNoAccentsTrans.transliterate(ARABICString);
System.out.println("ARABIC to Latin (no accents):" + result2);
}
}
Just checkout the answer and verify on your own.As the output you receive will be exactly as shown below.
Unicode codes:\u0635\u062F\u0627\u0645 \u062D\u0633\u064A\u0646\u0627\u0644\u062A\u0643\u0631\u064A\u062A\u064A
ARABIC to Latin:ṣdạm ḥsyn ạltkryty
ARABIC to Latin (no accents):sdam hsyn altkryty
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With