Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby: Reading PDF files

I'm looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX).

Until now I've found the rather old and simple PDF-toolkit (a pdftotext-wrapper) and PDF-reader, which was unable to read most of my files. Though the two libraries provide exactly the functionality I was looking for.

My question: Have I missed something? Is there a tool that is better suited (faster and more reliable) to solve my problem?

like image 854
Javier Avatar asked Apr 21 '09 15:04

Javier


People also ask

How do I open a PDF in github?

My solution: put the file in https://my.github.io/files/paper.pdf. add link <a href="https://my.github.io/files/paper.pdf">mypdf</a> push the changes.


2 Answers

You might find Docsplit useful:

Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)

like image 150
pw. Avatar answered Sep 25 '22 15:09

pw.


After trying different methods, I'm using PDF-Toolkit now. It's quite old, but it's fast, stable and reliable. Besides, it really doesn't need to be new, because it just wraps the xpdf commandline utilities.

like image 42
Javier Avatar answered Sep 23 '22 15:09

Javier