Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to program a text search and replace in PDF files

How would I be able to programmatically search and replace some text in a large number of PDF files? I would like to remove a URL that has been added to a set of files. I have been able to remove the link using javascript under Batch Processing in Adobe Pro, but the link text remains. I have seen recommendations to use text touchup, which works manually, but I don't want to modify 1300 files manually.

like image 513
rpilkey Avatar asked Oct 21 '08 00:10

rpilkey


People also ask

Can you add a search function to a PDF?

Type your search term(s) inside the "text box" where you are asked:"What word or phrase would you like to search for?" Select an option from the drop down menu for "Return results containing:" Click the "Search" button to execute the search request.

How do you replace a word in a PDF document?

On the PDF file, press “Ctrl+F” on your keyboard and input the text you would like to be replaced. Then type in new text in the input field of Replace to modify the current one to this new text. Click on “Replace” to start replacing PDF texts.

How do I do an advanced PDF search?

Open the Search windowChoose Edit > Advanced Search (Shift+Ctrl/Command+F). On the Find toolbar, click the arrow and choose Open Full Acrobat Search.


4 Answers

Finding text in a PDF can be inherently hard because of the graphical nature of the document format -- the letters you are searching for may not be contiguous in the file. That said, CAM::PDF has some search-replace capabilities and heuristics. Give changepagestring.pl a try and see if it works on your PDFs.

To install:

 $ cpan install CAM::PDF  # start a new terminal if this is your first cpan module  $ changepagestring.pl input.pdf oldtext newtext output.pdf 
like image 55
Chris Dolan Avatar answered Sep 25 '22 18:09

Chris Dolan


I have also become desperate. After 10 PDF Editor installations which all cost money, and no success:

pdftk + editor suffice:

Replace Text in PDF Files

  • Use pdftk to uncompress PDF page streams

    pdftk original.pdf output original.uncompressed.pdf uncompress

  • Replace the text (sometimes this works, sometimes it doesn't) within original.uncompressed.pdf

  • Repair the modified (and now broken) PDF

    pdftk original.uncompressed.pdf output original.uncompressed.fixed.pdf

(from Joel Dare)

like image 25
Larry Avatar answered Sep 25 '22 18:09

Larry


I just finished trying out infix for a text that is comprised of text ladened with diacritics with the hope of generating another text where characters with double and composed diacritics are replaced by alternate with single diacritics. Infix is such definitely a good solution for someone who does not care for the trouble of understanding the working of programmatic solutions. All the request changes were effected. Still need to understand how to effect reflow of words that change the layout of text.

like image 27
sobusola Avatar answered Sep 21 '22 18:09

sobusola


You can use the 'redaction' feature in Adobe Acrobat Pro to find & replace all references in a single document in one step...not sure if it can be automated to multiple steps.

http://help.adobe.com/en_US/Acrobat/9.0/Professional/WS5E28D332-9FF7-4569-AFAD-79AD60092D4D.w.html

like image 40
davr Avatar answered Sep 25 '22 18:09

davr