How would I extract metadata (e.g. FileSize, FileModifyDate, FileAccessDate) from a docx file?
You could use python-docx. python-docx has a method core_properties you can utilise. This method gives 15 metadata attributes such as author, category, etc.
See the below code to extract some of the metadata into a python dictionary:
import docx
def getMetaData(doc):
metadata = {}
prop = doc.core_properties
metadata["author"] = prop.author
metadata["category"] = prop.category
metadata["comments"] = prop.comments
metadata["content_status"] = prop.content_status
metadata["created"] = prop.created
metadata["identifier"] = prop.identifier
metadata["keywords"] = prop.keywords
metadata["last_modified_by"] = prop.last_modified_by
metadata["language"] = prop.language
metadata["modified"] = prop.modified
metadata["subject"] = prop.subject
metadata["title"] = prop.title
metadata["version"] = prop.version
return metadata
doc = docx.Document(file_path)
metadata_dict = getMetaData(doc)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With