Prior versions of Apple's iWork suite used a very simple document format:
index.apxl[z]
file describing the document structure in a proprietary but fairly easy to understand schemaiWork '13 has completely redone the format. Documents are still bundles, but what was in the index XML file is now encoded in a set of binary files with type suffix .iwa
packed into Index.zip
.
In Keynote, for example, there are the following iwa
files:
AnnotationAuthorStorage.iwa
CalculationEngine.iwa
Document.iwa
DocumentStylesheet.iwa
MasterSlide-{n}.iwa
Metadata.iwa
Slide{m}.iwa
ThemeStylesheet.iwa
ViewState.iwa
Tables/DataList.iwa
for MasterSlide
s 1…n and Slide
s 1…m
The purpose of each of these is quite clear from their naming. The files even appear uncompressed, with essentially all content text directly visible as strings among the binary blobs (albeit with some like RTF/NSAttributedString/similar-related garbage in the midst of the readable ASCII characters).
I have posted the unpacked Index
of a simple example Keynote document here: https://github.com/jrk/iwork-13-format.
However, the overall file format is non-obvious to me. Apple has a long history of using simple, platform-standard formats like plists for encoding most of their documents, but there is no clear type tag at the start of the files, and it is not obvious to me what these iwa
files are.
Do these files ring any bells? Is there evidence they are in some reasonably comprehensible serialization format?
Rummaging through the Keynote app runtime and class dumps with F-Script, the only evidence I've found is for some use of Protocol Buffers in the serialization classes which seem to be used for iWork, e.g.: https://github.com/nst/iOS-Runtime-Headers/blob/master/PrivateFrameworks/iWorkImport.framework/TSPArchiverBase.h.
Quickly piping a few of the files through protoc --decode_raw
with the first 0…16 bytes lopped off produced nothing obviously usable.
I've done some work reverse engineering the format and published my results here. I've written up a description of the format and provided a sample project as well.
Basically, the .iwa files are Protobuf streams compressed using Snappy.
Hope this helps!
Interesting project, I like it! Here is what I have found so far.
The first 4 bytes of each of the iwa files appear to be a length, with a tweak. So it looks like there will not be any 'magic' to verify file type.
Look at Slide1.iwa:
First 4 bytes are 00 79 02 00
File size is 637 bytes
take the first 00
off, and reverse the bytes: 00 02 79
00 02 79
== 633
637 - 633 = 4 bytes that hold the size of the file.
This checks out for the 4 files I looked at: Slide1.iwa, Slide2.iwa, Document.iwa, DocumentStylesheet.iwa
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With