Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Opening .doc files in Ruby

Can I open a .doc file and get that file's contents using Ruby?

like image 594
Leszek Avatar asked Jun 01 '11 00:06

Leszek


1 Answers

I recently dealt with this in a project and found that I wanted a lighter-weight library to get the text from .doc, .docx and .pdf files. DocRipper uses a combination of Antiword, grep and Poppler/pdftotext command-line tools to grab the text contents from files and return them as a utf-8 string.

dr = DocRipper::TextRipper.new('/path/to/file')
dr.text
=> "Document's text"
like image 165
Paul Avatar answered Sep 17 '22 16:09

Paul