Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Microsoft Powerpoint Python Parser [closed]

I am looking for a python based microsoft office parser - specifically powerpoint.

I want to be able to parse PPT in python and extract things like text and images from the powerpoint file.

Is there a library available?

like image 388
ramaz Avatar asked Jul 05 '10 17:07

ramaz


Video Answer


2 Answers

I don't think there is such a library.

What you can do is use pywin32 package to access PowerPoint's COM.

Here is a very nice introduction to using the win32com module to automate tasks in PowerPoint someone has written: http://www.s-anand.net/blog/automating-powerpoint-with-python/

like image 54
Gary Kerr Avatar answered Sep 29 '22 13:09

Gary Kerr


You might find such a beast, but I'd bet against it; you're looking for two rare properties together.

You might consider instead using the Open Office SDK, which already has vast amounts of machinery to read power point files, and abuse it for your purposes. This is all Java, not Python, but my guess is the learning curve to learn Java is much smaller than the learning curve to figure out how to read PowerPoint files.

like image 41
Ira Baxter Avatar answered Sep 29 '22 14:09

Ira Baxter