There is pdfgrep, which does exactly what its name suggests. <pre class="prettyprint"><code>pdfgrep -R 'a pattern to search recursively from path' /some/path </code></pre> I've used it for simple searches and it worked fine. (There are packages in Debian, Ubuntu and Fedora.) Since version 1.3.0 pdfgrep supports recursive search. This version is available in Ubuntu since Ubuntu 12.10 (Quantal). Your distribution should provide a utility called <code>pdftotext</code>: <pre class="prettyprint"><code>find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "your pattern"' \; </code></pre> The "-" is necessary to have pdftotext output to stdout, not to files. The <code>--with-filename</code> and <code>--label=</code> options will put the file name in the output of grep. The optional <code>--color</code> flag is nice and tells grep to output using colors on the terminal. (In Ubuntu, <code>pdftotext</code> is provided by the package <code>xpdf-utils</code> or <code>poppler-utils</code>.) This method, using <code>pdftotext</code> and <code>grep</code>, has an advantage over <code>pdfgrep</code> if you want to use features of GNU <code>grep</code> that <code>pdfgrep</code> doesn't support. Note: pdfgrep-1.3.x supports <code>-C</code> option for printing line of context. Recoll is a fantastic full-text GUI search application for Unix/Linux that supports dozens of different formats, including PDF. It can even pass the exact page number and search term of a query to the document viewer and thus allows you to jump to the result right from its GUI. Recoll also comes with a viable command-line interface and a web-browser interface. My actual version of pdfgrep (1.3.0) allows the following: <pre class="prettyprint"><code>pdfgrep -HiR 'pattern' /path </code></pre> When doing <code>pdfgrep --help</code>: <ul> <li>H: Print the file name for each match.</li> <li>i: Ignore case distinctions.</li> <li>R: Search directories recursively.</li> </ul> It works well on my Ubuntu. There is another utility called ripgrep-all, which is based on ripgrep. It can handle more than just PDF documents, like Office documents and movies, and the author claims it is faster than <code>pdfgrep</code>. Command syntax for recursively searching the current directory, and the second one limits to PDF files only: <pre class="prettyprint"><code>rga 'pattern' . rga --type pdf 'pattern' . </code></pre>

How to search contents of multiple pdf files?

Tags:

debian

There is pdfgrep, which does exactly what its name suggests.

pdfgrep -R 'a pattern to search recursively from path' /some/path

I've used it for simple searches and it worked fine.

(There are packages in Debian, Ubuntu and Fedora.)

Since version 1.3.0 pdfgrep supports recursive search. This version is available in Ubuntu since Ubuntu 12.10 (Quantal).

Your distribution should provide a utility called pdftotext:

find /path -name '*.pdf' -exec sh -c 'pdftotext "{}" - | grep --with-filename --label="{}" --color "your pattern"' \;

The "-" is necessary to have pdftotext output to stdout, not to files. The --with-filename and --label= options will put the file name in the output of grep. The optional --color flag is nice and tells grep to output using colors on the terminal.

(In Ubuntu, pdftotext is provided by the package xpdf-utils or poppler-utils.)

This method, using pdftotext and grep, has an advantage over pdfgrep if you want to use features of GNU grep that pdfgrep doesn't support. Note: pdfgrep-1.3.x supports -C option for printing line of context.

Recoll is a fantastic full-text GUI search application for Unix/Linux that supports dozens of different formats, including PDF. It can even pass the exact page number and search term of a query to the document viewer and thus allows you to jump to the result right from its GUI.

Recoll also comes with a viable command-line interface and a web-browser interface.

My actual version of pdfgrep (1.3.0) allows the following:

pdfgrep -HiR 'pattern' /path

When doing pdfgrep --help:

H: Print the file name for each match.
i: Ignore case distinctions.
R: Search directories recursively.

It works well on my Ubuntu.

There is another utility called ripgrep-all, which is based on ripgrep.

It can handle more than just PDF documents, like Office documents and movies, and the author claims it is faster than pdfgrep.

Command syntax for recursively searching the current directory, and the second one limits to PDF files only:

rga 'pattern' .
rga --type pdf 'pattern' .

Related questions
                            
                                How to grep a string in a directory and all its subdirectories? [duplicate]
                            
                                Docker command can't connect to Docker daemon
                            
                                Regex (grep) for multi-line search needed [duplicate]
                            
                                Unzip All Files In A Directory
                            
                                Test a weekly cron job [closed]
                            
                                How to invert a grep expression
                            
                                sh: 0: getcwd() failed: No such file or directory on cited drive
                            
                                What is the Linux equivalent to DOS pause?
                            
                                How to sort a file in-place
                            
                                How to find out which processes are using swap space in Linux?
                            
                                How to get the process ID to kill a nohup process?
                            
                                How to install latest version of git on CentOS 8.x/7.x/6.x
                            
                                Environment variable substitution in sed
                            
                                Setting up FTP on Amazon Cloud Server [closed]
                            
                                How to list running screen sessions?
                            
                                Create zip file and ignore directory structure
                            
                                How to insert a text at the beginning of a file?
                            
                                How do I change the root directory of an Apache server? [closed]
                            
                                What's the difference between .so, .la and .a library files?
                            
                                Is there a way to continue broken scp (secure copy) command process in Linux? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to search contents of multiple pdf files?

Tags:

linux

grep

full-text-search

pdf

debian

Recent Activity

Donate For Us