Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find and replace text in a existing PDF file with PDFTK (or other command line application) [closed]

Tags:

I have on each page of my PDF document a line with this string:

%REPLACE%

Which I'd like to find and replace with another string.

Does anyone know how to do this with some command line application such as PDFTK?

This folk gave me an important clue however I'd like something more direct.

Thanks.

like image 801
Roger Avatar asked Mar 26 '12 11:03

Roger


People also ask

Can you find and replace text in PDF?

Click on 'Edit' and then 'Find' 3. Type the word you want to find in 'Find' box 4. Click the Next button (You should now see the selection that will be changed highlighted in the pdf) 5. Click the arrow next to 'Replace with' and then type you word you would like to replace it with 6.

How do you replace text in a PDF using Python?

By inserting page[NameObject("/Contents")] = contents. decodedSelf before writer. addPage(page) , we force pyPDF2 to update content of the page object. This way I was able to overcome this problem and replace text from pdf file.

Does grep work on PDF?

Grep will not work as PDF is a binary format and the text is often compressed or encoded in a variety of ways.


2 Answers

You can try to modify content of your PDF as follows

  1. Uncompress the text streams of PDF

    pdftk file.pdf output uncompressed.pdf uncompress 
  2. Use sed to replace your text with another

    sed -e "s/ORIGINALSTRING/NEWSTRING/g" <uncompressed.pdf >modified.pdf 
  3. If this attempt was successful, re-compress the PDF with pdftk

    pdftk modified.pdf output recompressed.pdf compress 

Note: This way is not successful every time, mainly due to font subsetting

like image 105
Dingo Avatar answered Sep 17 '22 17:09

Dingo


For making a small change just on a few pages, inkscape can do a good job. It can also fix some issues in diagrams and with table borders. One must process each page separately, though, and stick the pages back together using pdfunite. (Unchanged page ranges can be extracted with pdfseparate.)

Inspiration: https://tatica.org/2015/07/13/edit-pdf-inkscape/

like image 26
Joachim Wagner Avatar answered Sep 18 '22 17:09

Joachim Wagner