Ruby: Reading PDF files

Tags:

I'm looking for a fast and reliable way to read/parse large PDF files in Ruby (on Linux and OSX).

Until now I've found the rather old and simple PDF-toolkit (a pdftotext-wrapper) and PDF-reader, which was unable to read most of my files. Though the two libraries provide exactly the functionality I was looking for.

My question: Have I missed something? Is there a tool that is better suited (faster and more reliable) to solve my problem?

854

asked Apr 21 '09 15:04

Javier

2 Answers

You might find Docsplit useful:

Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)

150

answered Sep 25 '22 15:09

pw.

After trying different methods, I'm using PDF-Toolkit now. It's quite old, but it's fast, stable and reliable. Besides, it really doesn't need to be new, because it just wraps the xpdf commandline utilities.

answered Sep 23 '22 15:09

Javier

Related questions
                            
                                Connecting to multiple databases in ruby on rails
                            
                                How to Properly Configure Rails 4.1 Enums in ActiveAdmin
                            
                                Rails Active Record - Get ids array from relation
                            
                                Can't install pg gem on Windows
                            
                                Rails console in production: NameError: uninitialized constant
                            
                                How to store enum as string to database in rails
                            
                                Don't know how to build task 'start' when run 'cap production deploy' for capistrano 3.8.0 with Rails
                            
                                How to install json gem - Failed to build gem native extension(mac 10.10)
                            
                                Destroying all delayed job in rails
                            
                                Rails routes with :name instead of :id url parameters
                            
                                Rails 3: Get current namespace?
                            
                                RAILS link_to external site, url is attribute of user table, like: @users.website
                            
                                How to set access-control-allow-origin in webrick under rails?
                            
                                How do I check whether a value in a string is an IP address
                            
                                Rails Devise: after_confirmation
                            
                                Getting a "bad interpreter" error when using brew
                            
                                What is the best way to stop a Unicorn Server process from running?
                            
                                What's wrong with "magic"?
                            
                                Does Ruby on Rails affect how a web page looks?
                            
                                Redis + ActionController::Live threads not dying

Ruby: Reading PDF files

Tags:

ruby

pdf

ruby-on-rails

pdf-parsing

Javier

People also ask

2 Answers

pw.

Javier

Recent Activity

Donate For Us