I am hoping for this question to become a comprehensive guide to PDF manipulation and rendering in Java. I have a fairly comprehensive implementation by stitching together multiple open source libraries, I would like to improve upon it.
Background
My requirements and current implementation:
- Checking existing PDF documents for specific conditions (PDF version, password protection, font embedding, cross reference tables etc.) - Not implemented.
- Allow for the definition of Acroform fields via page co-ordinates or some other mechanism. - Not implemented
- Provide capability to iterate over form fields in a PDF, examine the field type and fill it with data - iText v 2.0.8
- Render the PDF to an image at different resolutions/DPI - two implementations (pdfrenderer and IcePDF)
- Render HTML/XHTML files to PDF - Flying Saucer xhtmlrenderer
- Do all the above as a library in a Java server environment (implying thread safety)
What do I not like
I am dissatisfied with the following:
-
iText licensing: New versions of iText are under the AGPL license which is a non-starter for my project (and commercial projects in general?). The fee for the commercial license is non-trivial (spanning usage based pricing of a few cents a document to tens of thousands for site licenses) and if I am going to pay the license fees for the software, I would like to do a full market search for the best product. The 2.x versions of iText work OK, but there are enough bugs in there.
-
PDF version conformance: There are strange conformance issues when it comes to font embedding, cross reference tables etc. across these libraries to cause a reasonable amount of grief.
-
Rendering output quality: The quality of rendering to PNG from these files suffers from a few problems in the areas of embedded fonts, images and layers.
What I am hoping for
I am hoping to get some feedback from users and people who have researched PDF libraries. Please include as much of the following information as possible for completeness and posterity.
- is your answer/comment based on use or research
- name, version of the library and license (if commercial license, please include cost if possible)
- what do you use the library for
- what do you like about it / what is it good with
- what do you dislike about it / what is it not good with
- what is your overall impression
What is rendering in PDF?
PDF rendering is the term used to describe the translation and transport of (usually) web-based pages into PDF format directly on-screen and (usually) for onward use as a saved file, for despatch to a mobile device or for printing. Foxit Software.
Is PDFBox open source?
Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.
iText only costs you money if you actually make any money from the product you use it in. Which most people would consider fair. What are you comparing it against?
iText offers support through StackOverflow for non-paying users. And premium support for paying customers.