I am trying to read in with C# a file written with CArchive. From what I can tell the format is:
[length of next set of data][data]...etc
I'm still fuzzy on some of the data, though. How do I read in Date data? What about floats, ints, doubles, etc?
Also, [length of next set of data] could be a byte or word or dword. How do I know when it will be each? For instance, for a string "1.10" the data is:
04 31 2e 31 30
The 04
is the length, obviously and the rest are hex values for 1.10. Trivial. Later I have a string that is 41 characters long, but the [length] value is:
00 00 00 29
Why 4 bytes for the length? (0x29 = 41)
The main question is: Is there a spec for the format of CArchive output?
A CArchive object can process not only primitive types but also objects of CObject -derived classes designed for serialization. A serializable class usually has a Serialize member function, and it usually uses the DECLARE_SERIAL and IMPLEMENT_SERIAL macros, as described under class CObject .
Serialization is the process of writing or reading an object to or from a persistent storage medium such as a disk file. Serialization is ideal for situations where it is desired to maintain the state of structured data (such as C++ classes or structures) during or after execution of a program.
To answer your question about strings, the length value that is stored in the archive is itself variable-length depending on the length and encoding of its string. If the string is < 255
characters, one byte is used for the length. If the string is 255 - 65534
characters, 3 bytes are used - a 1-byte 0xFF
marker followed by a 2-byte word. If the string is 65535+
characters, 7 bytes are used - a 3-byte 0xFF 0xFF 0xFF
marker followed by a 4-byte dword. To make it even more complicated, if the string is Unicode encoded, the length value is preceeded by a 3-byte 0xFF 0xFFFE
marker. So in any, combination, you will never see a 4-byte length by itself, so what you showed has to be 3 0x00
bytes belonging to something else, followed by a 1-byte string length 0x29
.
So, the correct way to read a string is as follows:
Assume: string data is Ansi unless told otherwise.
Read a byte. If its value is < 255, string length is the value, goto 3.
Read a word. If its value is 0xFFFE
, string data is Unicode, goto 1. Otherwise, if its value is < 65535, string length is its value, goto 3. Otherwise, read a dword, string length is its value, goto 3.
read string length number of 8bit or 16bit values, depending on whether string is Ansi or Unicode, and then convert to desired encoding as needed.
According to the documentation:
The main CArchive implementation can be found in ARCCORE.CPP.
If you don't have the MFC source, see this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With