I'm using PDFBox 2.0.0-SNAPSHOT to build a PDF in Java. It is working fine for very basic characters (e.g. [a-zA-Z9-0]
) but I'm getting encoding errors for slightly more advanced characters such as ’
(quoteright
). Here's my code:
PDDocument pdf = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
pdf.addPage(page);
PDPageContentStream contents = new PDPageContentStream(pdf, page);
PDFont font = PDType1Font.HELVETICA;
contents.beginText();
contents.setFont(font, 12);
// ...
String text = "’";
contents.showText(text);
contents.endText();
contents.close();
I get this exception:
Can't encode U+2019 in font Helvetica. Type 1 fonts only support 8-bit code points
I looked up the supported characters for non-embedded fonts in Section D.1 of the PDF specification, and this character should be supported.
Indeed, if I use this trick, I can insert the correct character:
// ...
// String text = "’";
// contents.showText(text);
byte[] commands = "(x) Tj ".getBytes();
commands[1] = (byte)145; // = 221 octal = quoteright in WinAnsi
contents.appendRawCommands(commands);
// ...
But this isn't really a practical solution. Aside from the inconvenience of manually searching for every character that might be in the string, the appendRawCommands
method is now deprecated.
So, what's going on here? From the answer from above it is implied that showText
should not have the issues present with the old drawString
method, but something clearly isn't working.
EDIT: As requested in the comments, here is the full stack trace of the exception:
Exception in thread "main" java.lang.IllegalArgumentException: Can't encode U+2019 in font Helvetica. Type 1 fonts only support 8-bit code points
at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:343)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:285)
at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:314)
at com.fatfractal.test.PDFBoxTest.textWidth(PDFBoxTest.java:148)
at com.fatfractal.test.PDFBoxTest.showFlowingTextAt(PDFBoxTest.java:128)
at com.fatfractal.test.PDFBoxTest.build(PDFBoxTest.java:73)
at com.fatfractal.test.PDFBoxTest.main(PDFBoxTest.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Looking at the PDFBox code, it really seems like a bug. If you look at the PDType1Font.encode()
method, it automatically throws if the code point is > 0xFF. However, if the logic instead proceeded in this case, the GlyphList would convert the "\u2019" character to "quoteright", which would then be a valid character in the font.
As @jtahlborn explained in his answer, PDType1Font.encode()
is broken in the current 2.0.0 release candidate.
In contrast to the 1.x.x PDPageContentStream
method drawString
, though, the 2.0.0 release candidate method showText
is encoding aware.
As a work-around, therefore, you could use a composite font with subset embedding instead, e.g. on a standard MS Windows installation:
InputStream fontStream = new FileInputStream("c:/Windows/Fonts/ARIALUNI.TTF");
PDType0Font font = PDType0Font.load(pdf, fontStream);
Using this font your code will not fail for "’"
because composite font classes do not have the bug observed in PDType1Font
here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With