The oodocx module mentioned in the same page refers the user to an /examples folder that does not seem to be there.
I have read the documentation of python-docx 0.7.2, plus everything I could find in Stackoverflow on the subject, so please believe that I have done my “homework”.
Python is the only language I know (beginner+, maybe intermediate), so please do not assume any knowledge of C, Unix, xml, etc.
Task : Open a ms-word 2007+ document with a single line of text in it (to keep things simple) and replace any “key” word in Dictionary that occurs in that line of text with its dictionary value. Then close the document keeping everything else the same.
Line of text (for example) “We shall linger in the chambers of the sea.”
from docx import Document document = Document('/Users/umityalcin/Desktop/Test.docx') Dictionary = {‘sea’: “ocean”} sections = document.sections for section in sections: print(section.start_type) #Now, I would like to navigate, focus on, get to, whatever to the section that has my #single line of text and execute a find/replace using the dictionary above. #then save the document in the usual way. document.save('/Users/umityalcin/Desktop/Test.docx')
I am not seeing anything in the documentation that allows me to do this—maybe it is there but I don’t get it because everything is not spelled-out at my level.
I have followed other suggestions on this site and have tried to use earlier versions of the module (https://github.com/mikemaccana/python-docx) that is supposed to have "methods like replace, advReplace" as follows: I open the source-code in the python interpreter, and add the following at the end (this is to avoid clashes with the already installed version 0.7.2):
document = opendocx('/Users/umityalcin/Desktop/Test.docx') words = document.xpath('//w:r', namespaces=document.nsmap) for word in words: if word in Dictionary.keys(): print "found it", Dictionary[word] document = replace(document, word, Dictionary[word]) savedocx(document, coreprops, appprops, contenttypes, websettings, wordrelationships, output, imagefiledict=None)
Running this produces the following error message:
NameError: name 'coreprops' is not defined
Maybe I am trying to do something that cannot be done—but I would appreciate your help if I am missing something simple.
If this matters, I am using the 64 bit version of Enthought's Canopy on OSX 10.9.3
Python can create and modify Word documents, which have the . docx file extension, with the python-docx module. You can install the module by running pip install python-docx .
Word documents contain formatted text wrapped within three object levels. Lowest level- Run objects, Middle level- Paragraph objects and Highest level- Document object. So, we cannot work with these documents using normal text editors. But, we can manipulate these word documents in python using the python-docx module.
Go to Home > Replace. Enter the word or phrase you want to replace in Find what. Enter your new text in Replace with. Choose Replace All to change all occurrences of the word or phrase.
UPDATE: There are a couple of paragraph-level functions that do a good job of this and can be found on the GitHub site for python-docx
.
The current version of python-docx does not have a search()
function or a replace()
function. These are requested fairly frequently, but an implementation for the general case is quite tricky and it hasn't risen to the top of the backlog yet.
Several folks have had success though, getting done what they need, using the facilities already present. Here's an example. It has nothing to do with sections by the way :)
for paragraph in document.paragraphs: if 'sea' in paragraph.text: print paragraph.text paragraph.text = 'new text containing ocean'
To search in Tables as well, you would need to use something like:
for table in document.tables: for row in table.rows: for cell in row.cells: for paragraph in cell.paragraphs: if 'sea' in paragraph.text: paragraph.text = paragraph.text.replace("sea", "ocean")
If you pursue this path, you'll probably discover pretty quickly what the complexities are. If you replace the entire text of a paragraph, that will remove any character-level formatting, like a word or phrase in bold or italic.
By the way, the code from @wnnmaw's answer is for the legacy version of python-docx and won't work at all with versions after 0.3.0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With