Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python docx extracting Font name and size

I want to code a program in python which checks some properties of an MS Word file (.docx) like margins and font name and font size. (before moving forward I should note that Honestly, I have no clue what am I doing)

for the font part I've faced real problems:
according to: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html

"A style can inherit properties from another style, somewhat similarly to how Cascading Style Sheets (CSS) works. Inheritance is specified using the base_style attribute. By basing one style on another, an inheritance hierarchy of arbitrary depth can be formed. A style having no base style inherits properties from the document defaults."

so I tried this code:

d = Document('1.docx')
d_styles = d.styles

for st in d_styles:
    if st.name != "No List": #Ignoring The Numbering Style
        print(st.type, st.name, st.base_style)
        #print(dir(st.base_style), '\n') there is no such thing as font in dir(st.base_style)

st.base_style returns "None"

so based on "A style having no base style inherits properties from the document defaults" the answer should lie down in this part. But I don't know how to reach it.

Codes below also returned "None":

for st in d_styles:
    if st.name != "No List": #Ignoring The Numbering Style
        print(st.font.name)
#Outputs: None
for para in d.paragraphs:
    for r in para.runs:
        print (r.font.name)
#Outputs: None
for para in d.paragraphs:
    print(para.style.font.name)
#Outputs: None

I've used these Sources:
https://python-docx.readthedocs.io/en/latest/api/style.html
https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html


edit:

I've tried to deal with style object as a dictionary:

for key, value in styles.items() :
    print (key, value)
#ERROR: 'Styles' object has no attribute 'items'
print(styles.items())
#ERROR: 'Styles' object has no attribute 'items'
print(styles.keys())
#ERROR: 'Styles' object has no attribute 'keys'
print(styles.values())
#ERROR: 'Styles' object has no attribute 'values'

even this piece of code returns None:

style = d.styles['Normal']
f = style.font
print(f.name)
like image 398
AKLMI Avatar asked Jan 17 '26 12:01

AKLMI


1 Answers

According to the document :

The Styles object provides dictionary-style access to defined styles by name

And I think that is your problem. You are trying to access a dictionary as a list and that returns only the keys of the dictionary and not its values. Try the code-snippet below instead to see if it solves your problem or no. But for future references, try reading the Style Document carefully.

For getting the key values of style use:

d = Document('1.docx')
d_styles = d.styles
print(d_styles.keys())

After that, you can access each value of the dictionary using d_styles['yourKey']. For getting the values and keys together, try the snippet below.

d = Document('1.docx')
d_styles = d.styles
for key in d_styles:
    print(f'{key} : {d_styles[key]}')

Keep in mind that each style (for example d_styles[key]) is also iterable meaning you can perform iteration on it. So the snippet below is also valid.

d = Document('1.docx')
d_styles = d.styles
for key in d_styles:
    print(f'{key} : {d_styles[key]}')
    for val in d_styles[key]:
        print(val)

Play with keys and attributes a little bit and you'll find what you are looking for.

like image 165
ARK1375 Avatar answered Jan 20 '26 04:01

ARK1375



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!