Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doc, rtf and txt reader in python

Like csv.reader() are there any other functions which can read .rtf, .txt, .doc files in Python?

like image 892
Rajeev Avatar asked Jul 19 '10 06:07

Rajeev


2 Answers

You can read a text file with

txt = open("file.txt").read()

Try PyRTF for RTF files. I would think that reading MS Word .doc files are pretty unlikely unless you are on Windows and you can use some of the native MS interfaces for reading those files. This article claims to show how to write scripts that interface with Word.

like image 148
Jesse Dhillon Avatar answered Sep 28 '22 18:09

Jesse Dhillon


I've had a real headache trying to do this simple thing for word and writer documents.

There is a simple solution: call openoffice on the command line to convert your target document to text, then load the text into Python.

Other conversion tools I tried produced unreliable output, while other Python oOo libraries were too complex.

If you just want to get at the text so you can process it, use this on the linux command line:

soffice --headless --convert-to txt:Text /path_to/document_to_convert.doc

(call it from Python using subprocess if you want to automate it).

It will create text file you can simpley load into python.

(Credit)

like image 39
markling Avatar answered Sep 28 '22 18:09

markling