Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reverse engineering iWork '13 formats

Prior versions of Apple's iWork suite used a very simple document format:

  • documents were Bundles of resources (folders, zipped or not)
  • the bundle contained an index.apxl[z] file describing the document structure in a proprietary but fairly easy to understand schema

iWork '13 has completely redone the format. Documents are still bundles, but what was in the index XML file is now encoded in a set of binary files with type suffix .iwa packed into Index.zip.

In Keynote, for example, there are the following iwa files:

AnnotationAuthorStorage.iwa
CalculationEngine.iwa
Document.iwa
DocumentStylesheet.iwa
MasterSlide-{n}.iwa
Metadata.iwa
Slide{m}.iwa
ThemeStylesheet.iwa
ViewState.iwa
Tables/DataList.iwa

for MasterSlides 1…n and Slides 1…m

The purpose of each of these is quite clear from their naming. The files even appear uncompressed, with essentially all content text directly visible as strings among the binary blobs (albeit with some like RTF/NSAttributedString/similar-related garbage in the midst of the readable ASCII characters).

I have posted the unpacked Index of a simple example Keynote document here: https://github.com/jrk/iwork-13-format.

However, the overall file format is non-obvious to me. Apple has a long history of using simple, platform-standard formats like plists for encoding most of their documents, but there is no clear type tag at the start of the files, and it is not obvious to me what these iwa files are.

Do these files ring any bells? Is there evidence they are in some reasonably comprehensible serialization format?

Rummaging through the Keynote app runtime and class dumps with F-Script, the only evidence I've found is for some use of Protocol Buffers in the serialization classes which seem to be used for iWork, e.g.: https://github.com/nst/iOS-Runtime-Headers/blob/master/PrivateFrameworks/iWorkImport.framework/TSPArchiverBase.h.

Quickly piping a few of the files through protoc --decode_raw with the first 0…16 bytes lopped off produced nothing obviously usable.

like image 381
jrk Avatar asked Oct 24 '13 16:10

jrk


2 Answers

I've done some work reverse engineering the format and published my results here. I've written up a description of the format and provided a sample project as well.

Basically, the .iwa files are Protobuf streams compressed using Snappy.

Hope this helps!

like image 186
Sean Patrick O'Brien Avatar answered Oct 18 '22 20:10

Sean Patrick O'Brien


Interesting project, I like it! Here is what I have found so far.

The first 4 bytes of each of the iwa files appear to be a length, with a tweak. So it looks like there will not be any 'magic' to verify file type.

Look at Slide1.iwa:
First 4 bytes are 00 79 02 00
File size is 637 bytes
take the first 00 off, and reverse the bytes: 00 02 79
00 02 79 == 633
637 - 633 = 4 bytes that hold the size of the file.

This checks out for the 4 files I looked at: Slide1.iwa, Slide2.iwa, Document.iwa, DocumentStylesheet.iwa

like image 27
WMIF Avatar answered Oct 18 '22 20:10

WMIF