Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read binary data (from file) into a struct

Tags:

c

zip

byte

I'm reading binary data from a file, specifically from a zip file. (To know more about the zip format structure see http://en.wikipedia.org/wiki/ZIP_%28file_format%29)

I've created a struct that stores the data:

typedef struct {
                                            /*Start Size            Description                                 */
    int signatute;                          /*   0  4   Local file header signature = 0x04034b50                */
    short int version;                      /*   4  2   Version needed to extract (minimum)                     */
    short int bit_flag;                     /*   6  2   General purpose bit flag                                */
    short int compression_method;           /*   8  2   Compression method                                      */
    short int time;                         /*  10  2   File last modification time                             */
    short int date;                         /*  12  2   File last modification date                             */
    int crc;                                /*  14  4   CRC-32                                                  */
    int compressed_size;                    /*  18  4   Compressed size                                         */
    int uncompressed_size;                  /*  22  4   Uncompressed size                                       */
    short int name_length;                  /*  26  2   File name length (n)                                    */
    short int extra_field_length;           /*  28  2   Extra field length (m)                                  */
    char *name;                             /*  30  n   File name                                               */
    char *extra_field;                      /*30+n  m   Extra field                                             */

} ZIP_local_file_header;

The size returned by sizeof(ZIP_local_file_header) is 40, but if the sum of each field is calculated with sizeof operator the total size is 38.

If we have the next struct:

typedef struct {
    short int x;
    int y;
} FOO;

sizeof(FOO) returns 8 because the memory is allocated with 4 bytes every time. So, to allocate x are reserved 4 bytes (but the real size is 2 bytes). If we need another short int it will fill the resting 2 bytes of the previous allocation. But as we have an int it will be allocated plus 4 bytes and the empty 2 bytes are wasted.

To read data from file, we can use the function fread:

ZIP_local_file_header p;
fread(&p,sizeof(ZIP_local_file_header),1,file);

But as there're empty bytes in the middle, it isn't read correctly.

What can I do to sequentially and efficiently store data with ZIP_local_file_header wasting no bytes?

like image 444
rigon Avatar asked Oct 21 '10 14:10

rigon


2 Answers

In order to meet the alignment requirements of the underlying platform, structs may have "padding" bytes between members so that each member starts at a properly aligned address.

There are several ways around this: one is to read each element of the header separately using the appropriately-sized member:

fread(&p.signature, sizeof p.signature, 1, file);
fread(&p.version, sizeof p.version, 1, file);
...

Another is to use bit fields in your struct definition; these are not subject to padding restrictions. The downside is that bit fields must be unsigned int or int or, as of C99, _Bool; you may have to cast the raw data to the target type to interpret it correctly:

typedef struct {                 
    unsigned int signature          : 32;
    unsigned int version            : 16;                
    unsigned int bit_flag;          : 16;                
    unsigned int compression_method : 16;              
    unsigned int time               : 16;
    unsigned int date               : 16;
    unsigned int crc                : 32;
    unsigned int compressed_size    : 32;                 
    unsigned int uncompressed_size  : 32;
    unsigned int name_length        : 16;    
    unsigned int extra_field_length : 16;
} ZIP_local_file_header;

You may also have to do some byte-swapping in each member if the file was written in big-endian but your system is little-endian.

Note that name and extra field aren't part of the struct definition; when you read from the file, you're not going to be reading pointer values for the name and extra field, you're going to be reading the actual contents of the name and extra field. Since you don't know the sizes of those fields until you read the rest of the header, you should defer reading them until after you've read the structure above. Something like

ZIP_local_file_header p;
char *name = NULL;
char *extra = NULL;
...
fread(&p, sizeof p, 1, file);
if (name = malloc(p.name_length + 1))
{
    fread(name, p.name_length, 1, file);
    name[p.name_length] = 0;
}
if (extra = malloc(p.extra_field_length + 1))
{
    fread(extra, p.extra_field_length, 1, file);
    extra[p.extra_field_length] = 0;
}
like image 95
John Bode Avatar answered Sep 26 '22 03:09

John Bode


C structs are just about grouping related pieces of data together, they do not specify a particular layout in memory. (Just as the width of an int isn't defined either.) Little-endian/Big-endian is also not defined, and depends on the processor.

Different compilers, the same compiler on different architectures or operating systems, etc., will all layout structs differently.

As the file format you want to read is defined in terms of which bytes go where, a struct, although it looks very convenient and tempting, isn't the right solution. You need to treat the file as a char[] and pull out the bytes you need and shift them in order to make numbers composed of multiple bytes, etc.

like image 35
Adrian Smith Avatar answered Sep 26 '22 03:09

Adrian Smith