What is the best perl module to extract text from a pdf? [closed]

Question

What is the best way to extract text from a pdf?

Phssthpok · Accepted Answer

The CAM::PDF module is pretty useful for extracting text and maintaining some information about where it came from in the document. It installs /usr/local/bin/getpdftext.pl which demonstrates simple extraction. However, CAM::PDF can only read PDFs that are completely valid.

If you are dealing with ill-formed PDFs, you may need a more lenient parser, such as pdftotext. It dumps foo.pdf to foo.txt, which you could then read into Perl.

What is the best perl module to extract text from a pdf? [closed]

Tags:

text

pdf

perl

extraction

user_78361084

1 Answers

Phssthpok

Recent Activity

Donate For Us

What is the best perl module to extract text from a pdf? [closed]

Tags:

text

pdf

perl

extraction

user_78361084

1 Answers

Phssthpok

Related questions

Recent Activity

Donate For Us