I'm looking for a command-line program that will print out the text of a PDF file, just like cat
for a text file.
I've found pdftotxt
, and that would be workable, but I'd prefer something that replicates the cat
functionality because I want to pipe to grep
. Thanks!
To print a document on the default printer, just use the lp command followed by the name of the file you want to print.
Type the command for Evince with your PDF file's name, file extension and its full path relative to the Home directory. For instance, if your PDF file is named "wages. pdf" and it is stored in the Documents directory, type "evince Documents/wages.
With command-line tools, we can easily automate searching a large number of files. However, we must note that PDF is a binary format, and plain text search commands such as grep and sed will not work as expected on PDF files.
“Evince” is the program used for opening and rendering a PDF document for viewing purposes only in a Linux terminal.
On the man pages for pdftotext
, I found this:
pdftotext [options] [PDF-file [text-file]]
Description Pdftotext converts Portable Document Format (PDF) files to plain text.
Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is '-', the text is sent to stdout.
Thus to output to stdout
in order to pipe to grep
use this:
pdftotext mydoc.pdf - | grep mysearchterm
Maybe you can try this: https://github.com/luochen1990/nodejs-easy-pdf-parser
It is a npm package and you need to install nodejs (and npm) to use it.
It can be used as a command line tool:
npm install -g easy-pdf-parser
pdf2text test.pdf > test.txt
And this tool will sort text lines by their y coordinates, so it works great at most case. And it also works well with unicode and cross platform (as comparison: mingw64's pdftotext
will lose unicode characters on windows).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With