Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set encoding when converting text file to pdf using itext

Tags:

java

itext

I'm working on getting itext to output my UTF-8 encoded text correctly in fact the input file contains symbols like ° and Latin caracters (é,è,à...) .

But i didn't find a solution this is the code i'm using :

BufferedReader input = null;
Document output = null;
System.out.println("Convert text file to pdf");
System.out.println("input  : " + args[0]);
System.out.println("output : " + args[1]);
try {
  // text file to convert to pdf as args[0]
  input = 
    new BufferedReader (new FileReader(args[0]));
  // letter 8.5x11
  //    see com.lowagie.text.PageSize for a complete list of page-size constants.
  output = new Document(PageSize.LETTER, 40, 40, 40, 40);
  // pdf file as args[1]
  PdfWriter.getInstance(output, new FileOutputStream (args[1]));

  output.open();
  output.addAuthor("RealHowTo");
  output.addSubject(args[0]);
  output.addTitle(args[0]);

  BaseFont courier = BaseFont.createFont(BaseFont.COURIER, BaseFont.CP1252, BaseFont.EMBEDDED);
  Font font = new Font(courier, 12, Font.NORMAL);
  Chunk chunk = new Chunk("",font);
  output.add(chunk); 

  String line = "";
  while(null != (line = input.readLine())) {
    System.out.println(line);
    Paragraph p = new Paragraph(line);
    p.setAlignment(Element.ALIGN_JUSTIFIED);
    output.add(p);
  }
  System.out.println("Done.");
  output.close();
  input.close();
  System.exit(0);
}
catch (Exception e) {
  e.printStackTrace();
  System.exit(1);
}
}

Any idea will be appreciated.

like image 272
Amira Avatar asked Jan 21 '14 09:01

Amira


2 Answers

When I look at your code, I see a number of things that are odd.

  1. You say you require UTF-8, but you create a BaseFont object using BaseFont.CP1252 instead of BaseFont.IDENTITY_H (which is the "encoding" you need when you work with Unicode).
  2. You use the standard Type 1 font Courier, which is a font that doesn't know how to render é,è,à... and a font that is never embedded. As documented, the BaseFont.EMBEDDED parameter is ignored in this case!
  3. You don't use this font with an object that has actual content. The actual content is put into a Paragraph that is created using the default font "Helvetica", a font that doesn't know how to render é,è,à...

To solve this, you need to create the Paragraph with the appropriate font. That is NOT a standard type 1 font, but something like courier.ttf. You also need to use the appropriate encoding: BaseFont.IDENTITY_H.

like image 141
Bruno Lowagie Avatar answered Sep 30 '22 16:09

Bruno Lowagie


Both the reader and the writer should be set to use UTF-8 character set encoding to read/write UTF-8 characters properly. For example,

input = new BufferedReader(new InputStreamReader(args[0], "UTF-8"));
like image 40
Ivey Avatar answered Sep 30 '22 15:09

Ivey