I'm trying to generate a PDF that contains Arabic text using PDFBox Apache but the text is generated as separated characters because Apache parses given Arabic string to a sequence of general 'official' Unicode characters that is equivalent to the isolated form of Arabic characters.
Here is an example:
Target text to Write in PDF "Should be expected output in PDF File" -> جملة بالعربي
What I get in PDF File ->
I tried some methods but it's no use here are some of them:
1. Converting String to Stream of bits and trying to extract right values
2. Treating String a sequence of bytes with UTF-8 && UTF-16 and extracting values from them
There is some approach seems very promising to get the value "Unicode" of each character But it generate general "official Unicode" Here is what I mean
System.out.println( Integer.toHexString( (int)(new String("كلمة").charAt(1))) );
output is 644 but fee0 was the expected output because this character is in middle from then I should get the middle Unicode fee0
so what I want is some method that generates the correct Unicode not the just the official one
The very Left column in the first table in the following link represents the general Unicode
Arabic Unicode Tables Wikipedia
Here is a code that works. Download a sample font, e.g. trado.ttf
Make sure the pdfbox-app
and icu4j
jar files are in your classpath.
import java.io.File;
import java.io.IOException;
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import com.ibm.icu.text.Bidi;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("trado.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(bidiReorder(s));
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
private static String bidiReorder(String text)
{
try {
Bidi bidi = new Bidi((new ArabicShaping(ArabicShaping.LETTERS_SHAPE)).shape(text), 127);
bidi.setReorderingMode(0);
return bidi.writeReordered(2);
}
catch (ArabicShapingException ase3) {
return text;
}
}
}
The sample code in this answer might be outdated please refer to h q's answer for the working sample code
We are going to use ICU Library.
ICU stands for International Components for Unicode and it is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.
To download the Library go to the downloads page from here.
Choose the latest version of ICU4J as shown in the following image.
You will be transferred to another page and you will find a box with direct links of the needed components .Go ahead and download three Files you will find the highlighted in next image.
The following explanation for creating and adding a library in Netbeans IDE
Now you are ready to use the library just import what you want like that
import com.ibm.icu.What_You_Want_To_Import;
With ArabicShaping Class and reversing the String we can write a correct attached Arabic LINE
Here is the Code Notice the comments in the following code
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("Arabic Font File of format.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
//The Trick in the next Line of Code But Here is some few Notes first
//We have to reverse the string because PDFBox is Writting from the left but Arabic is RTL Language
//The output will be perfect except every line will be justified to the left "It's not hard to resolve this"
// So we have to write arabic string to pdf line by line..It will be like this
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s))).reverse().toString());
// Note the previous line of code throws ArabicShapingExcpetion
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
}
Here is the output
I hope that I had gone over everything.
Update : After reversing make sure to reverse the numbers again in order to get the same proper number
Here is a couple of functions that could help
public static boolean isInt(String Input)
{
try{Integer.parseInt(Input);return true;}
catch(NumberFormatException e){return false;}
}
public static String reverseNumbersInString(String Input)
{
char[] Separated = Input.toCharArray();int i = 0;
String Result = "",Hold = "";
for(;i<Separated.length;i++ )
{
if(isInt(Separated[i]+"") == true)
{
while(i < Separated.length && (isInt(Separated[i]+"") == true || Separated[i] == '.' || Separated[i] == '-'))
{
Hold += Separated[i];
i++;
}
Result+=reverse(Hold);
Hold="";
}
else{Result+=Separated[i];}
}
return Result;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With