Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read contents of an Table in MS-Word file Using Python?

Tags:

python

ms-word

How can I read and process contents of every cell of a table in a DOCX file?

I am using Python 3.2 on Windows 7 and PyWin32 to access the MS-Word Document.

I am a beginner so I don't know proper way to reach to table cells. So far I have just done this:

import win32com.client as win32
word = win32.gencache.EnsureDispatch('Word.Application')
word.Visible = False 
doc = word.Documents.Open("MyDocument")
like image 913
Aashiq Hussain Avatar asked Apr 28 '12 19:04

Aashiq Hussain


1 Answers

Jumping in rather late in life, but thought I'd put this out anyway: Now (2015), you can use the pretty neat doc python library: https://python-docx.readthedocs.org/en/latest/. And then:

from docx import Document

wordDoc = Document('<path to docx file>')

for table in wordDoc.tables:
    for row in table.rows:
        for cell in row.cells:
            print cell.text
like image 59
peterb Avatar answered Sep 22 '22 12:09

peterb