Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get size of uncompressed data in zlib?

I'm creating something that includes a file upload service of sorts, and I need to store data compressed with zlib's compress() function. I send it across the internet already compressed, but I need to know the uncompressed file size on the remote server. Is there any way I can figure out this information without uncompress()ing the data on the server first, just for efficiency? That's how I'm doing it now, but if there's a shortcut I'd love to take it.

By the way, why is it called uncompress? That sounds pretty terrible to me, I always thought it would be decompress...

like image 629
AriX Avatar asked May 30 '09 13:05

AriX


2 Answers

I doubt it. I don't believe this is something the underlying zlib libraries provide from memory (although it's been a good 7 or 8 years since I used it, the up-to-date docs don't seem to indicate this feature has been added).

One possibility would be to transfer another file which contained the uncompressed size (e.g., transfer both file.zip and file.zip.size) but that seems fraught with danger, especially if you get the size wrong.

Another alternative is, if the server uncompressing is time-expensive but doesn't have to be done immediately, to do it in a lower-priority background task (like with nice under Linux). But again, there may be drawbacks if the size checker starts running behind (too many uploads coming in).

And I tend to think of decompression in terms of "explosive decompression", not a good term to use :-)

like image 149
paxdiablo Avatar answered Nov 13 '22 08:11

paxdiablo


If you're uploading using the raw 'compress' format, then you won't have information on the size of the data that's being uploaded. Pax is correct in this regard.
You can store it as a 4 byte header at the start of the compression buffer - assuming that the file size doesn't exceed 4GB.
some C code as an example:

 uint8_t *compressBuffer = calloc(bufsize + sizeof (uLongf), 0);
 uLongf compressedSize = bufsize;
 *((uLongf *)compressBuffer) = filesize;
 compress(compressBuffer + sizeof (uLongf), &compressedSize, sourceBuffer, bufsize);

Then you send the complete compressBuffer of the size compressedSize + sizeof (uLongf). When you receive it on the server side you can use the following code to get the data back:

 // data is in compressBuffer, assume you already know compressed size.
 uLongf originalSize = *((uLongf *)compressBuffer);
 uint8_t *realCompressBuffer = compressBuffer + sizeof (uLongf);

If you don't trust the client to send the correct size then you will need to perform some sort of uncompressed data check on the server size. The suggestion of using uncompress to /dev/null is a reasonable one.
If you're uploading a .zip file, it contains a directory which tells you the size of the file when it's uncompressed. This information is built into the file format, again, though this is subject to malicious clients.

like image 25
Petesh Avatar answered Nov 13 '22 06:11

Petesh