Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the file format that the C++ MFC object CArchive writes to?

I am trying to read in with C# a file written with CArchive. From what I can tell the format is:

[length of next set of data][data]...etc

I'm still fuzzy on some of the data, though. How do I read in Date data? What about floats, ints, doubles, etc?

Also, [length of next set of data] could be a byte or word or dword. How do I know when it will be each? For instance, for a string "1.10" the data is:

04 31 2e 31 30

The 04 is the length, obviously and the rest are hex values for 1.10. Trivial. Later I have a string that is 41 characters long, but the [length] value is:

00 00 00 29

Why 4 bytes for the length? (0x29 = 41)

The main question is: Is there a spec for the format of CArchive output?

like image 703
Mike Webb Avatar asked Jan 19 '12 19:01

Mike Webb


People also ask

What is MFC CArchive?

A CArchive object can process not only primitive types but also objects of CObject -derived classes designed for serialization. A serializable class usually has a Serialize member function, and it usually uses the DECLARE_SERIAL and IMPLEMENT_SERIAL macros, as described under class CObject .

What is serialization in MFC?

Serialization is the process of writing or reading an object to or from a persistent storage medium such as a disk file. Serialization is ideal for situations where it is desired to maintain the state of structured data (such as C++ classes or structures) during or after execution of a program.


2 Answers

To answer your question about strings, the length value that is stored in the archive is itself variable-length depending on the length and encoding of its string. If the string is < 255 characters, one byte is used for the length. If the string is 255 - 65534 characters, 3 bytes are used - a 1-byte 0xFF marker followed by a 2-byte word. If the string is 65535+ characters, 7 bytes are used - a 3-byte 0xFF 0xFF 0xFF marker followed by a 4-byte dword. To make it even more complicated, if the string is Unicode encoded, the length value is preceeded by a 3-byte 0xFF 0xFFFE marker. So in any, combination, you will never see a 4-byte length by itself, so what you showed has to be 3 0x00 bytes belonging to something else, followed by a 1-byte string length 0x29.

So, the correct way to read a string is as follows:

Assume: string data is Ansi unless told otherwise.

  1. Read a byte. If its value is < 255, string length is the value, goto 3.

  2. Read a word. If its value is 0xFFFE, string data is Unicode, goto 1. Otherwise, if its value is < 65535, string length is its value, goto 3. Otherwise, read a dword, string length is its value, goto 3.

  3. read string length number of 8bit or 16bit values, depending on whether string is Ansi or Unicode, and then convert to desired encoding as needed.

like image 167
Remy Lebeau Avatar answered Oct 15 '22 13:10

Remy Lebeau


According to the documentation:

The main CArchive implementation can be found in ARCCORE.CPP.

If you don't have the MFC source, see this.

like image 35
wallyk Avatar answered Oct 15 '22 12:10

wallyk