Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python convert microsoft office docs to plain text on linux

Any recomendations on a method to convert .doc, .ppt, and .xls to plain text on linux using python? Really any method of conversion would be useful. I have already looked at using Open Office but, I would like a solution that does not require having to install Open Office.

like image 868
Tim Avatar asked Mar 26 '09 12:03

Tim


1 Answers

I'd go for the command line-solution (and then use the Python subprocess module to run the tools from Python).

Convertors for msword (catdoc), excel (xls2csv) and ppt (catppt) can be found (in source form) here: http://vitus.wagner.pp.ru/software/catdoc/.

Can't really comment on the usefullness of catppt but catdoc and xls2csv work great!

But be sure to first search your distributions repositories... On ubuntu for example catdoc is just one fast apt-get away.

like image 152
ChristopheD Avatar answered Oct 23 '22 19:10

ChristopheD