How can one find and replace text inside a PDF document using PDFBox 2.0, they pulled the old example and it's syntax no longer works so I am wondering if it's still possible and if so what the best way to go about it is. Thanks!
You can try like this:
public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException {
if (Strings.isEmpty(searchString) || Strings.isEmpty(replacement)) {
return document;
}
PDPageTree pages = document.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++) {
Object next = tokens.get(j);
if (next instanceof Operator) {
Operator op = (Operator) next;
//Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj")) {
// Tj takes one operator and that is the string to display so lets update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else if (op.getName().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++) {
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString) {
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
string = StringUtils.replaceOnce(string, searchString, replacement);
cosString.setValue(string.getBytes());
}
}
}
}
}
// now that the tokens are updated we will replace the page content stream.
PDStream updatedStream = new PDStream(document);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
page.setContents(updatedStream);
out.close();
}
return document;
}
I spent much time on coming up with a solution for this and ended up acquiring an Acrobat DC subscription so that I could create fields as placeholders for the text to be replaced. These fields in my case, were for customer information and order details so it was not very complex data, but the document was filled with pages of business related conditions and had a very complex layout.
Then I simply did this, which may be suitable for you.
private void update() throws InvalidPasswordException, IOException {
Map<String, String> map = new HashMap<>();
map.put("fieldname", "value to update");
File template = new File("template.pdf");
PDDocument document = PDDocument.load(template);
List<PDField> fields = document.getDocumentCatalog().getAcroForm().getFields();
for (PDField field : fields) {
for (Map.Entry<String, String> entry : map.entrySet()) {
if (entry.getKey().equals(field.getFullyQualifiedName())) {
field.setValue(entry.getValue());
field.setReadOnly(true);
}
}
}
File out = new File("out.pdf");
document.save(out);
document.close();
}
YMMV
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With