Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the file utility identify Microsoft Word files as CDF? What is this CDF?

Tags:

ms-word

I have some old Microsoft Word files (probably Word 97) lying around here and noticed that the standard Unix file utility identifies such files as "CDF". It is actually more precise, dumping detailed meta data, for example:

CDF V2 Document,
Little Endian, 
Os: Windows, 
Version 4.0, 
Code page: 1252, 
Title: ..., 
Author: ..., 
Template: Normal.dot, 
Last Saved By: ..., 
Revision Number: 1, 
Name of Creating Application: Microsoft Word 8.0, 
Create Time/Date: ..., 
Last Saved Time/Date: ..., 
Number of Pages: 1, 
Number of Words: 95, 
Number of Characters: 542, 
Security: 0

What does that CDF stand for? Is that kind of a general container format, like RIFF for media files? I can't find anything useful on the web. "Channel Definition Format" and "Compound Document Format" are clearly not meant, as those Microsoft Word files are completely binary. For Common Data Format I can't find a connection. I tried to find something in the sourcecode of the file util (in the version which comes with FreeBSD), but I could only find out that it has a dedicated readcdf.c which deals with this format.

like image 915
T-Bull Avatar asked Feb 06 '11 17:02

T-Bull


1 Answers

Compound Documents format is related to OLE/COM. It refers to linking and embedding objects, for example, Excel charts in Word documents.

See the historical (pre-XML) document specifications for MS Office, and the specific file format description is "Windows Compound Binary File Format Specification".

like image 179
renick Avatar answered Oct 12 '22 23:10

renick