Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a Python module for converting RTF to plain text? [closed]

Tags:

python

text

rtf

Ideally, I'd like a module or library that doesn't require superuser access to install; I have limited privileges in my working environment.

like image 284
Tony Avatar asked Aug 26 '09 20:08

Tony


People also ask

How do I change RTF to plain text?

Convert the RTF file to a text file using a word processor. To do this, first open the file in a program such as Microsoft Word or OpenOffice Writer. Select the “Save as” command in the File menu, choose the TXT format in the drop-down menu and click “Save.”

How do I change a document from RTF to DOC?

Find a desired location to save the file. From within the Save As Window, find the Save File Type field. Select the drop-down arrow, and change the file from Rich Text Format (rtf) to Word Document (doc). Select Save.


2 Answers

I've been working on a library called Pyth, which can do this:

http://pypi.python.org/pypi/pyth/

Converting an RTF file to plaintext looks something like this:

from pyth.plugins.rtf15.reader import Rtf15Reader from pyth.plugins.plaintext.writer import PlaintextWriter  doc = Rtf15Reader.read(open('sample.rtf'))  print PlaintextWriter.write(doc).getvalue() 

Pyth can also generate RTF files, read and write XHTML, generate documents from Python markup a la Nevow's stan, and has limited experimental support for latex and pdf output. Its RTF support is pretty robust -- we use it in production to read RTF files generated by various versions of Word, OpenOffice, Mac TextEdit, EIOffice, and others.

like image 88
Brendon Avatar answered Oct 09 '22 11:10

Brendon


OpenOffice has a RTF reader. You can use python to script OpenOffice, see here for more info.

You could probably try using the magic com-object on Windows to read anything that smells ms-binary. I wouldn't recommend that though.

Actually parsing the raw data probably won't be very hard, see this example written in .bat/QBasic.

DocFrac is a free open source converter betweeen RTF, HTML and text. Windows, Linux, ActiveX and DLL platforms available. It will probably be pretty easy to wrap it up in python.

RTF::TEXT::Converter - Perl extension for converting RTF into text. (in case You have problems withg DocFrac).

Official Rich Text Format (RTF) Specifications, version 1.7, by Microsoft.

Good luck (with the limited privileges in Your working environment).

like image 39
Paweł Polewicz Avatar answered Oct 09 '22 11:10

Paweł Polewicz