Is there any way to access and manipulate text in an existing docx document in a textbox with <code>python-docx</code>? I tried to find a keyword in all paragraphs in a document by iteration: <pre class="prettyprint"><code>doc = Document('test.docx') for paragraph in doc.paragraphs: if '<DATE>' in paragraph.text: print('found date: ', paragraph.text) </code></pre> It is found if placed in normal text, but not inside a textbox.

A workaround for textboxes that contain only formatted text is to use a floating, formatted table. It can be styled almost like a textbox (frames, colours, etc.) and is easily accessible by the <code>docx API</code>. <pre class="prettyprint"><code>doc = Document('test.docx') for table in doc.tables: for row in table.rows: for cell in row.cells: for paragraph in cell.paragraphs: if '<DATE>' in paragraph.text: print('found date: ', paragraph.text) </code></pre>

Python docx paragraph in textbox

Tags:

python

python-docx

Is there any way to access and manipulate text in an existing docx document in a textbox with python-docx?

I tried to find a keyword in all paragraphs in a document by iteration:

doc = Document('test.docx')

for paragraph in doc.paragraphs:
    if '<DATE>' in paragraph.text:
        print('found date: ', paragraph.text)

It is found if placed in normal text, but not inside a textbox.

517

asked Apr 27 '16 11:04

Stefan

2 Answers

A workaround for textboxes that contain only formatted text is to use a floating, formatted table. It can be styled almost like a textbox (frames, colours, etc.) and is easily accessible by the docx API.

doc = Document('test.docx')

for table in doc.tables:
    for row in table.rows:
        for cell in row.cells:
            for paragraph in cell.paragraphs:
                if '<DATE>' in paragraph.text:
                   print('found date: ', paragraph.text)

answered Oct 06 '22 22:10

Stefan

Not via the API, not yet at least. You'd have to uncover the XML structure it lives in and go down to the lxml level and perhaps XPath to find it. Something like this might be a start:

body = doc._body
# assuming differentiating container element is w:textBox
text_box_p_elements = body.xpath('.//w:textBox//w:p')

I have no idea whether textBox is the actual element name here, you'd have to sort that out with the rest of the XPath path details, but this approach will likely work. I use similar approaches frequently to work around features that aren't built into the API yet.

opc-diag is a useful tool for inspecting the XML. The basic approach is to create a minimally small .docx file containing the type of thing you're trying to locate. Then use opc-diag to inspect the XML Word generates when you save the file:

$ opc browse test.docx document.xml

http://opc-diag.readthedocs.org/en/latest/index.html

answered Oct 07 '22 00:10

scanny

Related questions
                            
                                Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation
                            
                                Argparse and ArgumentDefaultsHelpFormatter. Formatting of default values when sys.stdin/stdout are selected as default
                            
                                Matplotlib - Boxplot calculated on log10 values but shown in logarithmic scale
                            
                                Python legend in 3dplot
                            
                                How to setup remote debugging with Eclipse and PyDev
                            
                                tkinter - How to stop frame changing size when widget is added?
                            
                                Converting trained Tensorflow model to protobuf
                            
                                Python requests sometimes freezes
                            
                                ipython notebook listening on all IP addresses?
                            
                                What are the rules for comparing numpy arrays using ==?
                            
                                Scipy minimize constrained function
                            
                                Fastest Way To Run Through 50k Lines of Excel File in OpenPYXL
                            
                                How do I reload a python submodule?
                            
                                Running Cython in Jupyter iPython
                            
                                Segmentation Fault in Aerospike Python Client
                            
                                High performance array mean
                            
                                reading pgm images with cv2 in python
                            
                                Saving numpy array to csv produces TypeError Mismatch
                            
                                How to exactly add L1 regularisation to tensorflow error function
                            
                                Python 3, super.__del__()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python docx paragraph in textbox

Tags:

python

python-docx

Stefan

People also ask

2 Answers

Stefan

scanny

Recent Activity

Donate For Us