Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I build a PDF with searchable text from individual PNG images?

Tags:

image

pdf

png

Currently, I have a series of images (PNGs) and, for each, an unformatted text version of their content. I'd like to make a PDF where each image becomes a full page of the resulting PDF, with the corresponding text somehow also attached to the page, so that searching for some words brings you to pages with that text on it, even though the text is never directly displayed.

This is a one-shot job, so it doesn't have to be neat or scalable. I could use any language commonly available on a Linux system, or common command-line tools. (I also have a Windows system with Acrobat available, though there are near a thousand images, so something manual wouldn't work.)

like image 957
jon Avatar asked Nov 12 '22 12:11

jon


1 Answers

One option to try would be to generate a PDF using Java and Apache-Fop, but that might be more work than you're looking to do.

You might do better with iText; Example of adding PNG to iText to generate PDF

You will need to determine how to generate a Layer in which to place your searchable text; I am unable to advise you on how to do this step.

Here is how you can tell if a PDF contains text, which might help you with building one.

like image 59
JoshDM Avatar answered Nov 24 '22 23:11

JoshDM