Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I search a word in a Word 2007 .docx file?

I'd like to search a Word 2007 file (.docx) for a text string, e.g., "some special phrase" that could/would be found from a search within Word.

Is there a way from Python to see the text? I have no interest in formatting - I just want to classify documents as having or not having "some special phrase".

like image 312
Gerry Avatar asked Sep 22 '08 17:09

Gerry


People also ask

How do I search for a word in DOCX?

To open the Find pane from the Edit View, press Ctrl+F, or click Home > Find. Find text by typing it in the Search the document for… box.

Can you search for keywords in a word document?

Finding Text in a Word DocTo search for text in Word, you'll need to access the “Navigation” pane. You can do so by selecting “Find” in the “Editing” group of the “Home” tab. An alternative method to accessing this pane is by using the Ctrl + F shortcut key on Windows or Command + F on Mac.


2 Answers

After reading your post above, I made a 100% native Python docx module to solve this specific problem.

# Import the module from docx import *  # Open the .docx file document = opendocx('A document.docx')  # Search returns true if found     search(document,'your search string') 

The docx module is at https://python-docx.readthedocs.org/en/latest/

like image 160
mikemaccana Avatar answered Oct 07 '22 11:10

mikemaccana


More exactly, a .docx document is a Zip archive in OpenXML format: you have first to uncompress it.
I downloaded a sample (Google: some search term filetype:docx) and after unzipping I found some folders. The word folder contains the document itself, in file document.xml.

like image 20
PhiLho Avatar answered Oct 07 '22 10:10

PhiLho