There is pdfgrep, which does exactly what its name suggests.
pdfgrep -R 'a pattern to search recursively from path' /some/path
I've used it for simple searches and it worked fine.
(There are packages in Debian, Ubuntu and Fedora.)
Since version 1.3.0 pdfgrep supports recursive search. This version is available in Ubuntu since Ubuntu 12.10 (Quantal).
Your distribution should provide a utility called pdftotext
:
find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "your pattern"' \;
The "-" is necessary to have pdftotext output to stdout, not to files.
The --with-filename
and --label=
options will put the file name in the output of grep.
The optional --color
flag is nice and tells grep to output using colors on the terminal.
(In Ubuntu, pdftotext
is provided by the package xpdf-utils
or poppler-utils
.)
This method, using pdftotext
and grep
, has an advantage over pdfgrep
if you want to use features of GNU grep
that pdfgrep
doesn't support. Note: pdfgrep-1.3.x supports -C
option for printing line of context.
Recoll is a fantastic full-text GUI search application for Unix/Linux that supports dozens of different formats, including PDF. It can even pass the exact page number and search term of a query to the document viewer and thus allows you to jump to the result right from its GUI.
Recoll also comes with a viable command-line interface and a web-browser interface.
My actual version of pdfgrep (1.3.0) allows the following:
pdfgrep -HiR 'pattern' /path
When doing pdfgrep --help
:
It works well on my Ubuntu.
There is another utility called ripgrep-all, which is based on ripgrep.
It can handle more than just PDF documents, like Office documents and movies, and the author claims it is faster than pdfgrep
.
Command syntax for recursively searching the current directory, and the second one limits to PDF files only:
rga 'pattern' .
rga --type pdf 'pattern' .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With