Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use ICU4J library

Tags:

java

unicode

icu

While I am searching in the website about how to to display the RTL text correctly I found this post about the ICU library, in fact I don't have any previous experience on how to use it . and tho almost there is no clear online resources .

Any guy here has a previous experience with using it ? or at least tell me what I have to search for to get what I want ?

like image 751
Adham Avatar asked Jul 30 '12 21:07

Adham


2 Answers

Hi Adham I have e little experience in ICU4J I was trying to read an LTR Arabic text and convert it to RTL Text I changed the numbers from English to Arabic numbers and set the alignment to RTL This is a simple code that do the job I hope my little experience helped you this is the demos in the ICU4J site

        PdfReader reader = new PdfReader(INPUTFILE);


        String txt=PdfTextExtractor.getTextFromPage(reader, 1);

        BiDiClass bidiClass = new BiDiClass();

        String arabicNumber = bidiClass.englishToArabicNumber(txt);

        String out=bidiClass.makeLineLogicalOrder(arabicNumber, true);


        System.out.println(out);

and this is the BiDiClass

import com.ibm.icu.text.Bidi;
 import com.ibm.icu.text.Normalizer;


//Editor : Ibraheem Osama Mohamed



/**
 * This class is an implementation the the ICU4J class. TextNormalize
 * will call this only if the ICU4J library exists in the classpath.
 * @author <a href="mailto:[email protected]">Brian Carrier</a>
 * @version $Revision: 1.0 $
 */
public class BiDiClass {


    private static final String REPLACE_CHARS = "0123456789.";
    private Bidi bidi;




    private StringBuilder sb = new StringBuilder();

    /**
     * Constructor.
     */
    public BiDiClass()
    {
        bidi = new Bidi();

        /* We do not use bidi.setInverse() because that uses
         * Bidi.REORDER_INVERSE_NUMBERS_AS_L, which caused problems
         * in some test files. For example, a file had a line of:
         * 0 1 / ARABIC
         * and the 0 and 1 were reversed in the end result. 
         * REORDER_INVERSE_LIKE_DIRECT is the inverse Bidi mode
         * that more closely reflects the Unicode spec.
         */
        bidi.setReorderingMode(Bidi.REORDER_INVERSE_LIKE_DIRECT);
    }

   /**
     * Takes a line of text in presentation order and converts it to logical order.
     * @see TextNormalize.makeLineLogicalOrder(String, boolean)    
     * 
     * @param str String to convert
     * @param isRtlDominant RTL (right-to-left) will be the dominant text direction
     * @return The converted string
     */
    public String makeLineLogicalOrder(String str, boolean isRtlDominant)
    {   
        bidi.setPara(str, isRtlDominant?Bidi.RTL:Bidi.LTR, null);

        /* Set the mirror flag so that parentheses and other mirror symbols
         * are properly reversed, when needed.  With this removed, lines
         * such as (CBA) in the PDF file will come out like )ABC( in logical
         * order.
         */
        return bidi.writeReordered(Bidi.DO_MIRRORING);
    }

  //algorithm to change form English number to Arabic number
    public String englishToArabicNumber(String string){

        char[] ch=string.toCharArray();

        for (char c : ch) {
             if (REPLACE_CHARS.contains(String.valueOf(c))) {

                   c = (char) ('\u0660' - '0' + c);

             }
             sb.append(c);
          }


        return sb.toString();
    }


    /**
     * Normalize presentation forms of characters to the separate parts.
     * @see TextNormalize.normalizePres(String)
     *
     * @param str String to normalize
     * @return Normalized form
     */
    public String normalizePres(String str)
    {
        StringBuilder builder = null;
        int p = 0;
        int q = 0;
        int strLength = str.length();
        for (; q < strLength; q++) /* >>>*/
        {
            // We only normalize if the codepoint is in a given range.
            // Otherwise, NFKC converts too many things that would cause
            // confusion. For example, it converts the micro symbol in
            // extended Latin to the value in the Greek script. We normalize
            // the Unicode Alphabetic and Arabic A&B Presentation forms.
            char c = str.charAt(q);
            if ((0xFB00 <= c && c <= 0xFDFF) || (0xFE70 <= c && c <= 0xFEFF))/* >>>*/
            {
                if (builder == null) {
                    builder = new StringBuilder(strLength * 2);
                }
                builder.append(str.substring(p, q));
                // Some fonts map U+FDF2 differently than the Unicode spec.
                // They add an extra U+0627 character to compensate.
                // This removes the extra character for those fonts.
                if(c == 0xFDF2 && q > 0 && (str.charAt(q-1) == 0x0627 ||     str.charAt(q-1) == 0xFE8D))
                {
                    builder.append("\u0644\u0644\u0647");
                }
                else
                {
                    // Trim because some decompositions have an extra space,
                    // such as U+FC5E
                    builder.append(
                            Normalizer.normalize(c, Normalizer.NFKC).trim());
                }
                p = q + 1;
            }
        }
        if (builder == null) {
            return str;
        } else {
            builder.append(str.substring(p, q));
            return builder.toString();
        }
    }



    /**
     * Decomposes Diacritic characters to their combining forms.
     *
     * @param str String to be Normalized
     * @return A Normalized String
     */     
    public String normalizeDiac(String str)
    {
        StringBuilder retStr = new StringBuilder();
        int strLength = str.length();
        for (int i = 0; i < strLength; i++) /* >>>*/
        {
            char c = str.charAt(i);
            if(Character.getType(c) == Character.NON_SPACING_MARK
                    || Character.getType(c) == Character.MODIFIER_SYMBOL
                    || Character.getType(c) == Character.MODIFIER_LETTER)
            {
                /*
                 * Trim because some decompositions have an extra space, such as
                 * U+00B4
                 */
                retStr.append(Normalizer.normalize(c, Normalizer.NFKC).trim());
            }
            else
            {
                retStr.append(str.charAt(i));
            }
        }
        return retStr.toString();
    }

      }
like image 60
Ibraheem Osama Avatar answered Nov 13 '22 05:11

Ibraheem Osama


Android N now offers ICU4J Android Framework APIs

Android N exposes a subset of the ICU4J APIs via the android.icu package, rather than com.ibm.icu. The Android framework may choose not to expose ICU4J APIs for various reasons

Here are a few important things to note:

  1. The ICU4J Android framework APIs do not include all the ICU4J APIs.
  2. NDK developers should know that Android ICU4C is not supported.
  3. The APIs in the Android framework do not replace Android’s support for localizing with resources.
like image 30
Dhaval Jivani Avatar answered Nov 13 '22 04:11

Dhaval Jivani