Using Python to extract images and text from a word document

Tags:

I would like to run a script on a folder full of word documents that reads through the documents and pulls out images and their captions (text right below the images). From the research I've done, I think pywin32 might be a viable solution. I know how to use pywin32 to find strings and pull them out, but I need help with the images part. How can I read through a docx file and have an event occur when an image is found? Thank you for any help! I am using Python 2.7.

419

asked Jun 14 '11 14:06

Preston Donovan

1 Answers

Docx files can be unzipped for extracting the images.

answered Oct 12 '22 16:10

Kevin C.

Related questions
                            
                                Using celery as a fault tolerant scheduler
                            
                                hg convert --authors wrongUsers <-- what is the format of the file?
                            
                                Running generated nose tests
                            
                                How to check for hidden files & folders on NTFS partition using python on linux?
                            
                                Modules paths in Python
                            
                                sdist error: option --manifest-only not recognized
                            
                                Serving static html in Google app engine Python
                            
                                create a global function in python
                            
                                How to replace links using lxml and iterlinks
                            
                                What is the best API for registering and configuring domain names?
                            
                                Python implementation of avro slow?
                            
                                Why I can't convert a list of str to a list of floats?
                            
                                Extracting links to pages in another PDF from PDF using Python or other method
                            
                                Pure python library to read and write jpeg format
                            
                                Difference between yield statement in python and MyHDL
                            
                                Python super() - should be working but isn't?
                            
                                Problems linking to static files in Django 1.3
                            
                                Are python's file write() and urlopen() methods asynchronous?
                            
                                Python tkinter Entry widget status switch via Radio buttons
                            
                                Getting rid of artifacts/grid-lines when plotting 3d surfaces

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Python to extract images and text from a word document

Tags:

python

ms-word

image

extract

pywin32

Preston Donovan

People also ask

1 Answers

Kevin C.

Recent Activity

Donate For Us