Like csv.reader()
are there any other functions which can read .rtf
, .txt
, .doc
files in Python?
You can read a text file with
txt = open("file.txt").read()
Try PyRTF for RTF files. I would think that reading MS Word .doc files are pretty unlikely unless you are on Windows and you can use some of the native MS interfaces for reading those files. This article claims to show how to write scripts that interface with Word.
I've had a real headache trying to do this simple thing for word and writer documents.
There is a simple solution: call openoffice on the command line to convert your target document to text, then load the text into Python.
Other conversion tools I tried produced unreliable output, while other Python oOo libraries were too complex.
If you just want to get at the text so you can process it, use this on the linux command line:
soffice --headless --convert-to txt:Text /path_to/document_to_convert.doc
(call it from Python using subprocess if you want to automate it).
It will create text file you can simpley load into python.
(Credit)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With