Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use python-docx to replace text in a Word document and save

The oodocx module mentioned in the same page refers the user to an /examples folder that does not seem to be there.
I have read the documentation of python-docx 0.7.2, plus everything I could find in Stackoverflow on the subject, so please believe that I have done my “homework”.

Python is the only language I know (beginner+, maybe intermediate), so please do not assume any knowledge of C, Unix, xml, etc.

Task : Open a ms-word 2007+ document with a single line of text in it (to keep things simple) and replace any “key” word in Dictionary that occurs in that line of text with its dictionary value. Then close the document keeping everything else the same.

Line of text (for example) “We shall linger in the chambers of the sea.”

from docx import Document  document = Document('/Users/umityalcin/Desktop/Test.docx')  Dictionary = {‘sea’: “ocean”}  sections = document.sections for section in sections:     print(section.start_type)  #Now, I would like to navigate, focus on, get to, whatever to the section that has my #single line of text and execute a find/replace using the dictionary above. #then save the document in the usual way.  document.save('/Users/umityalcin/Desktop/Test.docx') 

I am not seeing anything in the documentation that allows me to do this—maybe it is there but I don’t get it because everything is not spelled-out at my level.

I have followed other suggestions on this site and have tried to use earlier versions of the module (https://github.com/mikemaccana/python-docx) that is supposed to have "methods like replace, advReplace" as follows: I open the source-code in the python interpreter, and add the following at the end (this is to avoid clashes with the already installed version 0.7.2):

document = opendocx('/Users/umityalcin/Desktop/Test.docx') words = document.xpath('//w:r', namespaces=document.nsmap) for word in words:     if word in Dictionary.keys():         print "found it", Dictionary[word]         document = replace(document, word, Dictionary[word]) savedocx(document, coreprops, appprops, contenttypes, websettings,     wordrelationships, output, imagefiledict=None)  

Running this produces the following error message:

NameError: name 'coreprops' is not defined

Maybe I am trying to do something that cannot be done—but I would appreciate your help if I am missing something simple.

If this matters, I am using the 64 bit version of Enthought's Canopy on OSX 10.9.3

like image 488
user2738815 Avatar asked Jul 17 '14 14:07

user2738815


People also ask

Can you edit a word document with python?

Python can create and modify Word documents, which have the . docx file extension, with the python-docx module. You can install the module by running pip install python-docx .

Does python-docx work with Doc?

Word documents contain formatted text wrapped within three object levels. Lowest level- Run objects, Middle level- Paragraph objects and Highest level- Document object. So, we cannot work with these documents using normal text editors. But, we can manipulate these word documents in python using the python-docx module.

How do I replace text in a word document?

Go to Home > Replace. Enter the word or phrase you want to replace in Find what. Enter your new text in Replace with. Choose Replace All to change all occurrences of the word or phrase.


1 Answers

UPDATE: There are a couple of paragraph-level functions that do a good job of this and can be found on the GitHub site for python-docx.

  1. This one will replace a regex-match with a replacement str. The replacement string will appear formatted the same as the first character of the matched string.
  2. This one will isolate a run such that some formatting can be applied to that word or phrase, like highlighting each occurence of "foobar" in the text or perhaps making it bold or appear in a larger font.

The current version of python-docx does not have a search() function or a replace() function. These are requested fairly frequently, but an implementation for the general case is quite tricky and it hasn't risen to the top of the backlog yet.

Several folks have had success though, getting done what they need, using the facilities already present. Here's an example. It has nothing to do with sections by the way :)

for paragraph in document.paragraphs:     if 'sea' in paragraph.text:         print paragraph.text         paragraph.text = 'new text containing ocean' 

To search in Tables as well, you would need to use something like:

for table in document.tables:     for row in table.rows:         for cell in row.cells:             for paragraph in cell.paragraphs:                 if 'sea' in paragraph.text:                     paragraph.text = paragraph.text.replace("sea", "ocean") 

If you pursue this path, you'll probably discover pretty quickly what the complexities are. If you replace the entire text of a paragraph, that will remove any character-level formatting, like a word or phrase in bold or italic.

By the way, the code from @wnnmaw's answer is for the legacy version of python-docx and won't work at all with versions after 0.3.0.

like image 50
scanny Avatar answered Sep 18 '22 17:09

scanny