Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: Is gzdeflate safe across multiple machines?

In the PHP manual there is a comment on gzdeflate saying:

gzcompress produces longer data because it embeds information about the encoding onto the string. If you are compressing data that will only ever be handled on one machine, then you don't need to worry about which of these functions you use. However, if you are passing data compressed with these functions to a different machine you should use gzcompress.

and then

running 50000 repetitions on various content, i found that gzdeflate() and gzcompress() both performed equally fast regardless content and compression level, but gzinflate() was always about twice as fast as gzuncompress().

For my purpose I am archiving data on a machine for future use. The data is read often, but written only once. In theory it will one day be moved onto another machine, if I change servers at some point, but that is a few years down the road.

Is it safe for me to use gzdeflate and gzinflate as opposed to gzcompress and gzuncompress?

My thinking is as follows: gzinflate is faster and this will help the server a lot since there will be lots of read requests. If at some point in the future I can't read the file then I should be able to figure out how to decompress the file and recompress it, right? It is not that the gzinflate will just magically not work one day, like the first comment appears to be saying. Even missing a 6 byte header I'm sure that it'll be expandable somehow.

Thoughts?

UPDATE -- Benchmark

10,000 iterations each:

gzdeflate took 19.158888816833 seconds and size 18521
gzinflate took 1.4803981781006 seconds
gzcompress took 19.376484870911 seconds and size 18527
gzuncompress took 1.6339199542999 seconds
gzencode took 20.015944004059 seconds and size 18539
gzdecodetook 1.8822891712189 seconds
like image 909
Alasdair Avatar asked Feb 22 '13 11:02

Alasdair


1 Answers

The comment is nonsense. You can use any of gzcompress, gzdeflate, or gzencode to produce compressed data that can be portably decompressed anywhere. Those functions only differ in the wrapper around the deflate data (RFC 1951). gzcompress has a zlib wrapper (RFC 1950), gzdeflate has no wrapper, and gzencode has a gzip wrapper (RFC 1952).

I would recommend not using gzdeflate, since no wrapper means no integrity check. gzdeflate should only be used when some other wrapper is being generated outside of that, e.g. for zip files, which also use the deflate format. The comment about speed is almost certainly false. The integrity check of gzuncompress() takes very little time compared to the decompression. You should do your own tests.

From this one example I might be overgeneralizing, but I would say that you should completely ignore the comments in the PHP documentation. They are, to be generous, uninformed.

By the way, these functions are named in a horribly confusing way. Only gzencode should have "gz" in the name, since that is the only one of those that actually deals in the .gz format. gzcompress sounds like it compresses to the gzip format, but in fact it compresses to the zlib format.

like image 126
Mark Adler Avatar answered Nov 02 '22 07:11

Mark Adler