I'd like to search a Word 2007 file (.docx) for a text string, e.g., "some special phrase" that could/would be found from a search within Word.
Is there a way from Python to see the text? I have no interest in formatting - I just want to classify documents as having or not having "some special phrase".
To open the Find pane from the Edit View, press Ctrl+F, or click Home > Find. Find text by typing it in the Search the document for… box.
Finding Text in a Word DocTo search for text in Word, you'll need to access the “Navigation” pane. You can do so by selecting “Find” in the “Editing” group of the “Home” tab. An alternative method to accessing this pane is by using the Ctrl + F shortcut key on Windows or Command + F on Mac.
After reading your post above, I made a 100% native Python docx module to solve this specific problem.
# Import the module from docx import * # Open the .docx file document = opendocx('A document.docx') # Search returns true if found search(document,'your search string')
The docx module is at https://python-docx.readthedocs.org/en/latest/
More exactly, a .docx document is a Zip archive in OpenXML format: you have first to uncompress it.
I downloaded a sample (Google: some search term filetype:docx) and after unzipping I found some folders. The word folder contains the document itself, in file document.xml.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With