Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I set the language in text with python-docx

I create word files using the python-docx library. I want to be able to set different parts of the document to different languages. How can the language be set with python-docx? Preferrably, I would like to do it at the run-level, since I need different languages on the same line (it's a dual language document I am creating). However, there does not seem to by any language attribute for runs, nor for paragraphs.

like image 859
NiklasR Avatar asked May 01 '16 13:05

NiklasR


People also ask

How to work with Docx in Python?

Python - Working with .docx module 1 The first step is to install this third-party module python-docx. You can use pip “pip install python-docx” 2 After installation import “docx” NOT “python-docx”. 3 Use “docx.Document” class to start working with the word document.

How to work with Word documents in Python?

Word documents contain formatted text wrapped within three object levels. The Lowest level-run objects, middle level-paragraph objects, and highest level-document object. So, we cannot work with these documents using normal text editors. But, we can manipulate these word documents in python using the python-docx module.

How do I set text to italics in Python?

To set the text to italics you have to set it true. To make some specific word (s) italics, it needs to be set True along with its add_run () statement. Example 3: Applying italics to a complete paragraph. Example 4: Applying italics to a specific word or phrase.

How to find a specific word (text) from a document?

Use Document () to initiate the document to work with. For example, we will be using "recognition_letter_template.docx" file as a file to work with in this tutorial. So it will be loaded as: Document consists of paragraphs.Thus, if you need to find a specific word (text), you need to go through the paragraphs in the document object.


1 Answers

I think the language has to be set via document styles in word/styles.xml or on run level. But currently there is no API support for this task in python-docx.

Referring to this answer, you can try the following code to alter the properties in the oxml element objects. p4 shows the run level attempt. (Tested with python-docx==0.8.10 + LibreOffice Writer with German and English language dictionaries.)

Note: The language field in the Core Document Properties is just a meta data information and is not used for global spell checking.

import docx # python-docx==0.8.10


doc = docx.Document()

# For new document (document-wide):
# Set language value in the documents' default Run's Properties element.
styles_element = doc.styles.element
rpr_default = styles_element.xpath('./w:docDefaults/w:rPrDefault/w:rPr')[0]
lang_default = rpr_default.xpath('w:lang')[0]
lang_default.set(docx.oxml.shared.qn('w:val'),'de-DE')

title = doc.add_paragraph('Rechtschreibprüfung', style='Title')

p1 = doc.add_paragraph(
    'Das ist ein deutscher Satz. '
    'Die Rechtschreibprüfung sollte nichts anstreichen.',
    style='Normal'
    )

# For existing styles:
# For styles without a language value
# you can append one explicitly by
# iterating over those styles in the document.
for my_style in doc.styles:
    style = doc.styles[my_style.name]
    rpr = style.element.get_or_add_rPr()
    lang = docx.oxml.shared.OxmlElement('w:lang')
    if not rpr.xpath('w:lang'):
        lang.set(docx.oxml.shared.qn('w:val'),'de-DE')
        lang.set(docx.oxml.shared.qn('w:eastAsia'),'en-US')
        lang.set(docx.oxml.shared.qn('w:bidi'),'ar-SA')
        rpr.append(lang)

p2 = doc.add_paragraph(
    'This sentence is written in English. '
    'The automatic spell checking should complain, '
    'because all styles’ language was set to German.',
    style='Quote'
    )

# For addressing specifc styles:
# Update (or append to) a specific style,
# e.g. in order to use multiple styles
# to handle more than one language per document.
body_style = doc.styles['Body Text']
body_rpr = body_style.element.get_or_add_rPr()
body_lang = body_rpr.xpath('w:lang')[0]
body_lang.set(docx.oxml.shared.qn('w:val'),'en-US')

p3 = doc.add_paragraph(
    'This sentence is written again in English. '
    'The automatic spell checking should not complain, '
    'because this style’s language now has been set to English.',
    style='Body Text'
    )

# Run Level:
# For mixing multiple languages
# within the same style per paragraph.
p4 = doc.add_paragraph(style='Body Text')
p4_text = p4.add_run()
p4_text.add_text(
    'On Run Level: This sentence is written once again in English. '
    'Spell check = OK | '
    )
# Add a new run with its language
# differing from the style's language value.
p4_text = p4.add_run()
p4_rpr = p4_text.element.get_or_add_rPr()
p4_run_lang = docx.oxml.shared.OxmlElement('w:lang')
p4_run_lang.set(docx.oxml.shared.qn('w:val'),'de-DE')
p4_run_lang.set(docx.oxml.shared.qn('w:eastAsia'),'en-US')
p4_run_lang.set(docx.oxml.shared.qn('w:bidi'),'ar-SA')
p4_rpr.append(p4_run_lang)
p4_text.add_text(
    'Und das ist noch einmal ein deutscher Satz. '
    'Rechtschreibprüfung = okay'
    )

doc.save('my-document.docx')
like image 157
winkelband Avatar answered Oct 16 '22 14:10

winkelband