Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print contents of a PDF to the command line

I'm looking for a command-line program that will print out the text of a PDF file, just like cat for a text file.

I've found pdftotxt, and that would be workable, but I'd prefer something that replicates the cat functionality because I want to pipe to grep. Thanks!

like image 538
andronikus Avatar asked Oct 10 '11 22:10

andronikus


People also ask

How do I print from terminal?

To print a document on the default printer, just use the lp command followed by the name of the file you want to print.

How do I open a PDF in command prompt?

Type the command for Evince with your PDF file's name, file extension and its full path relative to the Home directory. For instance, if your PDF file is named "wages. pdf" and it is stored in the Documents directory, type "evince Documents/wages.

Can I grep a PDF file?

With command-line tools, we can easily automate searching a large number of files. However, we must note that PDF is a binary format, and plain text search commands such as grep and sed will not work as expected on PDF files.

Which command is used to view PDF file in Linux?

“Evince” is the program used for opening and rendering a PDF document for viewing purposes only in a Linux terminal.


2 Answers

On the man pages for pdftotext, I found this:

pdftotext [options] [PDF-file [text-file]]

Description Pdftotext converts Portable Document Format (PDF) files to plain text.

Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is '-', the text is sent to stdout.

Thus to output to stdout in order to pipe to grep use this:

pdftotext mydoc.pdf - | grep mysearchterm
like image 50
jsvk Avatar answered Sep 26 '22 10:09

jsvk


Maybe you can try this: https://github.com/luochen1990/nodejs-easy-pdf-parser

It is a npm package and you need to install nodejs (and npm) to use it.

It can be used as a command line tool:

npm install -g easy-pdf-parser
pdf2text test.pdf > test.txt

And this tool will sort text lines by their y coordinates, so it works great at most case. And it also works well with unicode and cross platform (as comparison: mingw64's pdftotext will lose unicode characters on windows).

like image 40
luochen1990 Avatar answered Sep 23 '22 10:09

luochen1990