Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify and validate an OOXML file?

I need to be able to identify that a given file is an OOXML file based on the contents of the file, and not on the file's extension.

OOXML files are really a collection of XML and text files in a zip container, which means that I cannot use the file's magic number as it will just indicate that it is a zip file.

So what I'm really asking is are there any files that are required to be present in an OOXML Open Packaging Convention (OPC) container? If so the presence of that file in an OPC container indicates that it is likely to be an OOXML file, and the absence of that file indicates that it definitely is not an OOXML file.

This question is the OOXML version of this ODF question.

like image 502
jwaddell Avatar asked Jan 30 '26 03:01

jwaddell


2 Answers

Yes, there is a way. Go to OpenXMLDeveloper.org and download the PPTX that is "02: Open XML Packages" (Presentation 02). Then, on Slide 12 it tells you how to identify an Open XML document. It is document.xml, the rels files and [Content_Types].xml file (most importantly the the ContentType element). The important thing here is to use what's inside the file, not the file structure itself (Open Packaging Convention).

Another great resource is Open XML Markup Explained. Chapter 1 and then "Setting Up the Main Document" is a great place to find out about the structure of a Word docx. Excel and PowerPoint's structures are listed later on.

like image 159
Todd Main Avatar answered Jan 31 '26 19:01

Todd Main


A similar answer as that I gave to your ODF question - look at the technical specification of the format.

like image 43
Amber Avatar answered Jan 31 '26 20:01

Amber



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!