Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you keep table rows together in python-docx?

As an example, I have a generic script that outputs the default table styles using python-docx (this code runs fine):

import docx
d=docx.Document()
type_of_table=docx.enum.style.WD_STYLE_TYPE.TABLE

list_table=[['header1','header2'],['cell1','cell2'],['cell3','cell4']]
numcols=max(map(len,list_table))
numrows=len(list_table)

styles=(s for s in d.styles if s.type==type_of_table)


for stylenum,style in enumerate(styles,start=1):
    label=d.add_paragraph('{}) {}'.format(stylenum,style.name))
    label.paragraph_format.keep_with_next=True
    label.paragraph_format.space_before=docx.shared.Pt(18)
    label.paragraph_format.space_after=docx.shared.Pt(0)
    table=d.add_table(numrows,numcols)
    table.style=style
    for r,row in enumerate(list_table):
        for c,cell in enumerate(row):
            table.row_cells(r)[c].text=cell


d.save('tablestyles.docx')   

Next, I opened the document, highlighted a split table and under paragraph format, selected "Keep with next," which successfully prevented the table from being split across a page:

enter image description here

Here is the XML code of the non-broken table:

enter image description here

You can see the highlighted line shows the paragraph property that should be keeping the table together. So I wrote this function and stuck it in the code above the d.save('tablestyles.docx') line:

def no_table_break(document):
    tags=document.element.xpath('//w:p')
    for tag in tags:
        ppr=tag.get_or_add_pPr()
        ppr.keepNext_val=True


no_table_break(d)     

When I inspect the XML code the paragraph property tag is set properly and when I open the Word document, the "Keep with next" box is checked for all tables, yet the table is still split across pages. Am I missing an XML tag or something that's preventing this from working properly?

like image 897
LMc Avatar asked Jan 10 '17 21:01

LMc


People also ask

Which command is used for merging selected cells in Python?

You can also use Shift + M to merge multiple cells.

How do you insert a paragraph in Python?

To add a paragraph in a word document we make use the inbuilt method add_paragraph() to add a paragraph in the word document. By using this method we can even add paragraphs which have characters like '\n', '\t' and '\r'.


2 Answers

Ok, I also needed this. I think we were all making the incorrect assumption that the setting in Word's table properties (or the equivalent ways to achieve this in python-docx) was about keeping the table from being split across pages. It's not -- instead, it's simply about whether or not a table's rows can be split across pages.

Given that we know how successfully do this in python-docx, we can prevent tables from being split across pages by putting each table within the row of a larger master table. The code below successfully does this. I'm using Python 3.6 and Python-Docx 0.8.6

import docx
from docx.oxml.shared import OxmlElement
import os
import sys


def prevent_document_break(document):
    """https://github.com/python-openxml/python-docx/issues/245#event-621236139
       Globally prevent table cells from splitting across pages.
    """
    tags = document.element.xpath('//w:tr')
    rows = len(tags)
    for row in range(0, rows):
        tag = tags[row]  # Specify which <w:r> tag you want
        child = OxmlElement('w:cantSplit')  # Create arbitrary tag
        tag.append(child)  # Append in the new tag


d = docx.Document()
type_of_table = docx.enum.style.WD_STYLE_TYPE.TABLE

list_table = [['header1', 'header2'], ['cell1', 'cell2'], ['cell3', 'cell4']]
numcols = max(map(len, list_table))
numrows = len(list_table)

styles = (s for s in d.styles if s.type == type_of_table)

big_table = d.add_table(1, 1)
big_table.autofit = True

for stylenum, style in enumerate(styles, start=1):
    cells = big_table.add_row().cells
    label = cells[0].add_paragraph('{}) {}'.format(stylenum, style.name))
    label.paragraph_format.keep_with_next = True
    label.paragraph_format.space_before = docx.shared.Pt(18)
    label.paragraph_format.space_after = docx.shared.Pt(0)

    table = cells[0].add_table(numrows, numcols)
    table.style = style
    for r, row in enumerate(list_table):
        for c, cell in enumerate(row):
            table.row_cells(r)[c].text = cell

prevent_document_break(d)

d.save('tablestyles.docx')

# because I'm lazy...
openers = {'linux': 'libreoffice tablestyles.docx',
           'linux2': 'libreoffice tablestyles.docx',
           'darwin': 'open tablestyles.docx',
           'win32': 'start tablestyles.docx'}
os.system(openers[sys.platform])
like image 87
TSeymour Avatar answered Oct 19 '22 08:10

TSeymour


Have been straggling with the problem for some hours and finally found the solution worked fine for me. I just changed the XPath in the topic starter's code so now it looks like this:

def keep_table_on_one_page(doc):
    tags = self.doc.element.xpath('//w:tr[position() < last()]/w:tc/w:p')
    for tag in tags:
        ppr = tag.get_or_add_pPr()
        ppr.keepNext_val = True

The key moment is this selector

[position() < last()]

We want all but the last row in each table to keep with the next one

like image 24
DeadAd Avatar answered Oct 19 '22 06:10

DeadAd