Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When using Python docx how to enable spelling in output document?

I'm using the Python docx library to generate an document, that I would like to be able to open and find spelling errors in. But when I open the document none of the spelling errors flag (little red underlines) or are identified if I run a spell check. If I edit a line or copy, cut and paste the content back into word the spelling functionality works as expected.

Is there some way to get the output document to display/recognize spelling errors in the output document automatically? I've played around with the "no_proof" flag but that doesn't seem to help. (Using 64 bit Python 3.4, docx 8.6 on a Windows box, Opening output in Word 2013)

Thanks for any ideas!

Code to reproduce:

from docx import Document
document = Document()
paragraph = document.add_paragraph('This has a spellling errror')
document.save(r'SpellingTester.docx')

Output in Word :

like image 887
Colin Talbert Avatar asked Aug 22 '17 14:08

Colin Talbert


People also ask

How to manipulate Word documents in Python using Python-docx?

But, we can manipulate these word documents in python using the python-docx module. 1. The first step is to install this third-party module python-docx. You can use pip “pip install python-docx” or download the tarball from here. Here’s the Github repository. 2. After installation import “docx” NOT “python-docx”.

How to use docx module in Python?

Pip command to install this module is: Python docx module allows user to manipulate docs by either manipulating the existing one or creating a new empty document and manipulating it. It is a powerful tool as it helps you to manipulate the document to a very large extend. You can also manipulate the font size, colour and its style using this module.

Do I have to use MS Word to use docx?

Since the docx library creates .docx files, you don’t have to use MS Word. Both Google Docs and LibreOffice are free alternatives that support .docx files, and they are as good as the MS Office suite. To create a .docx file, we need to create a Document object first.

Can I read MS Word files in Python?

For instance, if you are developing a natural language processing application in Python that takes MS Word files as input, you will need to read MS Word files in Python before you can process the text.


1 Answers

I would try using document.settings object that it's wrapping the lxml node element of the document. As you can see from the documentation there's the hideSpellingErrors attribute.

Python DOCX Settings

<xsd:element name="hideSpellingErrors" type="CT_OnOff" minOccurs="0"/>

EDIT: After researching a little further I would try something like that:

import docx

document = docx.Document()

DOCX = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'

element = document.settings.element.find(DOCX + 'proofState')
element.attrib[DOCX + 'grammar'] = 'dirty'
element.attrib[DOCX + 'spelling'] = 'dirty'

document.add_paragraph("This has a spellling errror")
document.save("test.docx")

With DOCX prefix that can change, but it's easily read in the lxml element. I don't know right now if there's a way to do things more directly and cleanly but this works for me.

More about proofState setting in docx:

  • Same problem here
  • Trying to do the opposite here
like image 110
GendoIkari Avatar answered Sep 20 '22 08:09

GendoIkari