I am not sure if I've been missing anything obvious, but I have not found anything documented about how one would go to insert Word elements (tables, for example) at some specific place in a document? I am loading an existing MS Word .docx document by using: <pre class="prettyprint"><code>my_document = Document('some/path/to/my/document.docx') </code></pre> My use case would be to get the 'position' of a bookmark or section in the document and then proceed to insert tables below that point. I'm thinking about an API that would allow me to do something along those lines: <pre class="prettyprint"><code>insertion_point = my_document.bookmarks['bookmark_name'].position my_document.add_table(rows=10, cols=3, position=insertion_point+1) </code></pre> I saw that there are plans to implement something akin to the 'range' object of the MS Word API, this would effectively solve that problem. In the meantime, is there a way to instruct the <code>document</code> object methods where to insert the new elements? Maybe I can glue some lxml code to find a node and pass that to these python-docx methods? Any help on this subject would be much appreciated! Thanks.

I remembered an old adage, "use the source, Luke!", and could figure it out. A post from python-docx owner on its git project page also gave me a hint: https://github.com/python-openxml/python-docx/issues/7. The full XML document model can be accessed by using the its <code>_document_part._element</code> property. It behaves exactly like an lxml etree element. From there, everything is possible. To solve my specific insertion point problem, I created a temp docx.Document object which I used to store my generated content. <pre class="prettyprint"><code>import docx from docx.oxml.shared import qn tmp_doc = docx.Document() # Generate content in tmp_doc document tmp_doc.add_heading('New heading', 1) # more content generation using docx API. # ... # Reference the tmp_doc XML content tmp_doc_body = tmp_doc._document_part._element.body # You could pretty print it by using: #print(docx.oxml.xmlchemy.serialize_for_reading(tmp_doc_body)) </code></pre> I then loaded my docx template (containing a bookmark named 'insertion_point') into a second docx.Document object. <pre class="prettyprint"><code>doc = docx.Document('/some/path/example.docx') doc_body = doc._document_part._element.body #print(docx.oxml.xmlchemy.serialize_for_reading(doc_body)) </code></pre> The next step is parsing the doc XML to find the index of the insertion point. I defined a small function for the task at hand, which returns a named bookmark parent paragraph element: <pre class="prettyprint"><code>def get_bookmark_par_element(document, bookmark_name): """ Return the named bookmark parent paragraph element. If no matching bookmark is found, the result is '1'. If an error is encountered, '2' is returned. """ doc_element = document._document_part._element bookmarks_list = doc_element.findall('.//' + qn('w:bookmarkStart')) for bookmark in bookmarks_list: name = bookmark.get(qn('w:name')) if name == bookmark_name: par = bookmark.getparent() if not isinstance(par, docx.oxml.CT_P): return 2 else: return par return 1 </code></pre> The newly defined function was used toget the bookmark 'insertion_point' parent paragraph. Error control is left to the reader. <pre class="prettyprint"><code>bookmark_par = get_bookmark_par_element(doc, 'insertion_point') </code></pre> We can now use bookmark_par's etree index to insert our tmp_doc generated content at the right place: <pre class="prettyprint"><code>bookmark_par_parent = bookmark_par.getparent() index = bookmark_par_parent.index(bookmark_par) + 1 for child in tmp_doc_body: bookmark_par_parent.insert(index, child) index = index + 1 bookmark_par_parent.remove(bookmark_par) </code></pre> The document is now finalized, the generated content having been inserted at the bookmark location of an existing Word document. <pre class="prettyprint"><code># Save result # print(docx.oxml.xmlchemy.serialize_for_reading(doc_body)) doc.save('/some/path/generated_doc.docx') </code></pre> I hope this can help someone, as the documentation regarding this is still yet to be written.

python-docx insertion point

Tags:

python-3.x

cursor-position

python-docx

insertion

I am not sure if I've been missing anything obvious, but I have not found anything documented about how one would go to insert Word elements (tables, for example) at some specific place in a document?

I am loading an existing MS Word .docx document by using:

my_document = Document('some/path/to/my/document.docx')

My use case would be to get the 'position' of a bookmark or section in the document and then proceed to insert tables below that point.

I'm thinking about an API that would allow me to do something along those lines:

insertion_point = my_document.bookmarks['bookmark_name'].position
my_document.add_table(rows=10, cols=3, position=insertion_point+1)

I saw that there are plans to implement something akin to the 'range' object of the MS Word API, this would effectively solve that problem. In the meantime, is there a way to instruct the document object methods where to insert the new elements?

Maybe I can glue some lxml code to find a node and pass that to these python-docx methods? Any help on this subject would be much appreciated! Thanks.

597

asked Jul 25 '14 21:07

Apteryx

2 Answers

You put [image] as a token in your template document:

for paragraph in document.paragraphs:
    if "[image]" in paragraph.text:
        paragraph.text = paragraph.text.strip().replace("[image]", "")

        run = paragraph.add_run()
        run.add_picture(image_path, width=Inches(3))

you have have a paragraph in a table cell as well. just find the cell and do as above.

answered Oct 04 '22 15:10

David Dehghan

I remembered an old adage, "use the source, Luke!", and could figure it out. A post from python-docx owner on its git project page also gave me a hint: https://github.com/python-openxml/python-docx/issues/7.

The full XML document model can be accessed by using the its _document_part._element property. It behaves exactly like an lxml etree element. From there, everything is possible.

To solve my specific insertion point problem, I created a temp docx.Document object which I used to store my generated content.

import docx
from docx.oxml.shared import qn
tmp_doc = docx.Document()

# Generate content in tmp_doc document
tmp_doc.add_heading('New heading', 1)
# more content generation using docx API.
# ...

# Reference the tmp_doc XML content
tmp_doc_body = tmp_doc._document_part._element.body
# You could pretty print it by using:
#print(docx.oxml.xmlchemy.serialize_for_reading(tmp_doc_body))

I then loaded my docx template (containing a bookmark named 'insertion_point') into a second docx.Document object.

doc = docx.Document('/some/path/example.docx')
doc_body = doc._document_part._element.body
#print(docx.oxml.xmlchemy.serialize_for_reading(doc_body))

The next step is parsing the doc XML to find the index of the insertion point. I defined a small function for the task at hand, which returns a named bookmark parent paragraph element:

def get_bookmark_par_element(document, bookmark_name):
"""
Return the named bookmark parent paragraph element. If no matching
bookmark is found, the result is '1'. If an error is encountered, '2'
is returned.
"""
doc_element = document._document_part._element
bookmarks_list = doc_element.findall('.//' + qn('w:bookmarkStart'))
for bookmark in bookmarks_list:
    name = bookmark.get(qn('w:name'))
    if name == bookmark_name:
        par = bookmark.getparent()
        if not isinstance(par, docx.oxml.CT_P): 
            return 2
        else:
            return par
return 1

The newly defined function was used toget the bookmark 'insertion_point' parent paragraph. Error control is left to the reader.

bookmark_par = get_bookmark_par_element(doc, 'insertion_point')

We can now use bookmark_par's etree index to insert our tmp_doc generated content at the right place:

bookmark_par_parent = bookmark_par.getparent()
index = bookmark_par_parent.index(bookmark_par) + 1
for child in tmp_doc_body:
    bookmark_par_parent.insert(index, child)
    index = index + 1
bookmark_par_parent.remove(bookmark_par)

The document is now finalized, the generated content having been inserted at the bookmark location of an existing Word document.

# Save result
# print(docx.oxml.xmlchemy.serialize_for_reading(doc_body))
doc.save('/some/path/generated_doc.docx')

I hope this can help someone, as the documentation regarding this is still yet to be written.

answered Oct 04 '22 15:10

Apteryx

Related questions
                            
                                Python 3 urlopen context manager mocking
                            
                                Find out if an Python object is callable
                            
                                Can't install zbar
                            
                                BeautifulSoup: Can't convert NavigableString to string
                            
                                Can't install python Polyglot package on Windows
                            
                                Python Librosa : What is the default frame size used to compute the MFCC features?
                            
                                Trouble pivoting in pandas (spread in R)
                            
                                What is a keyword in Robot Framework?
                            
                                Python 3.5 dill pickling/unpickling on different servers: "KeyError: 'ClassType'"
                            
                                How to find Run length encoding in python
                            
                                "RuntimeError: Calling Tcl from different appartment" tkinter and threading
                            
                                How to make the Shebang be able to choose the correct Python interpreter between python3 and python3.5
                            
                                What is the quickest way to increment date string YYYY-MM-DD in Python?
                            
                                Google App Engine Python: Error in yaml config file when deploying
                            
                                Pandas - find specific value in entire dataframe
                            
                                How to set the minimum and maximum value for each item in a Numpy array?
                            
                                Open base64 String Image in Jupyter Notebook Without Saving
                            
                                How to check if all the elements in list are present in pandas column
                            
                                How to do the Bisection method in Python
                            
                                NameError: global name 'myExample2' is not defined # modules

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python-docx insertion point

Tags:

python-3.x

cursor-position

python-docx

insertion

Apteryx

People also ask

2 Answers

David Dehghan

Apteryx

Recent Activity

Donate For Us