Why is .tar.gz still much more common than .tar.xz? [closed]

Tags:

Whenever I see some source packages or binaries which are compressed with GZip I wonder if there are still reasons to favor gz over xz (excluding time travel to 2000), the savings of the LZMA compression algorithm are substantial and decompressions isn't magnitudes worse than gzip.

729

asked Jun 27 '11 13:06

soc

2 Answers

"Lowest Common Denominator". The extra space saved is rarely worth the loss of interoperability. Most embedded Linux systems have gzip, but not xz. Many old system as well. Gnu Tar which is the industry standard supports flags -z to process through gzip, and -j to process through bzip2, but some old systems don't support the -J flag for xz, meaning it requires 2-step operation (and a lot of extra diskspace for uncompressed .tar unless you use the syntax of |tar xf - - which many people don't know about.) Also, uncompressing the full filesystem of some 10MB from tar.gz on embedded ARM takes some 2 minutes and isn't really a problem. No clue about xz but bzip2 takes around 10-15 minutes. Definitely not worth the bandwidth saved.

answered Oct 11 '22 02:10

SF.

The ultimate answer is accessibility, with a secondary answer of purpose. Reasons why XZ is not necessarily as suitable as Gzip:

Embedded and legacy systems are far more likely to lack sufficient available memory to decompress LZMA/LZMA2 archives such as XZ. As an example, if XZ can shave 400 KiB (vs. Gzip) off of a package destined for an OpenWrt router, what good is the minor space savings if the router has 16 MiB of RAM? A similar situation appears with very old computer systems. One might scoff at the thought of downloading and compiling the latest version of Bash on an ancient SparcStation LX with 32MB of RAM, but it happens.
Such systems usually have slow processors, and decompression time increases can be very high. Three seconds extra to decompress on your Core i5 can be severely long on a 200 MHz ARM core or a 50 MHz microSPARC. Gzip compression is extremely fast on such processors when compared to all better compression methods such as XZ or even Bzip2.
Gzip is pretty much universally supported by every UNIX-like system (and nearly every non-UNIX-like system too) created in the past two decades. XZ availability is far more limited. Compression is useless without the ability to decompress it.
Higher compression takes a lot of time. If compression time is more important than compression ratio, Gzip beats XZ. Honestly, lzop is much faster than Gzip and still compresses okay, so applications that need the fastest compression possible and don't require Gzip's ubiquity should look at that instead. I routinely shuffle folders quickly across a trusted LAN connection with commands such as "tar -c * | lzop -1 | socat -u - tcp-connect:192.168.0.101:4444" and Gzip could be used similarly over a much slower link (i.e. doing the same thing I just described through an SSH tunnel over the Internet).

Now, on the flip side, there are situations where XZ compression is vastly superior:

Sending data over slow links. The Linux 3.7 kernel source code is 34 MiB smaller in XZ format than in Gzip format. If you have a super fast connection, choosing XZ could mean saving one minute of download time; on a cheap DSL connection or a 3G cellular connection, it could shave an hour or more off the download time.
Shrinking backup archives. Compressing the source code for Apache's httpd-2.4.2 with "gzip-9" vs. "xz -9e" yields an XZ archive that is 62.7% the size of the Gzip archive. If the same compressibility exists in a data set you currently store as 100 GiB worth of .tar.gz archives, converting to .tar.xz archives would cut a whopping 37.3 GiB off of the backup set. Copying this entire backup data set to a USB 2.0 hard drive (maxing out around 30 MiB/sec transfers) as Gzipped data would take 55 minutes, but XZ compression would make the backup take 20 minutes less. Assuming you'll be working with these backups on a modern desktop system with plenty of CPU power and the one-time-only compression speed isn't a serious problem, using XZ compression generally makes more sense. Why shuffle around extra data if you don't need to?
Distributing large amounts of data that might be highly compressible. As previously mentioned, Linux 3.7 source code is 67 MiB for .tar.xz and 101 MiB for .tar.gz; the uncompressed source code is about 542 MiB and is almost entirely text. Source code (and text in general) is typically highly compressible because of the amount of redundancy in the contents, but compressors like Gzip that operate with a much smaller dictionary don't get to take advantage of redundancy that goes beyond their dictionary size.

Ultimately, it all falls back to a four-way tradeoff: compressed size, compression/decompression speed, copying/transmission speed (reading the data from disk/network), and availability of the compressor/decompressor. The selection is highly dependent on the question "what are you planning to do with this data?"

Also check out this related post from which I learned some of the things I repeat here.

answered Oct 11 '22 03:10

Jody Bruchon

Related questions
                            
                                How do I export my project as a .zip of git repository?
                            
                                Adding folders to a zip file using python
                            
                                Not able to download Zip file from gmail which contains executable Jar in it
                            
                                How can I protect myself from a zip bomb?
                            
                                How to compress a String in Java?
                            
                                In Java: How to zip file from byte[] array?
                            
                                How to clone git repository from its zip
                            
                                Installing PHP Zip Extension
                            
                                How can I Zip and Unzip a string using GZIPOutputStream that is compatible with .Net?
                            
                                zip file and avoid directory structure
                            
                                How can I list the files in a zip archive without decompressing it?
                            
                                Can Windows' built-in ZIP compression be scripted?
                            
                                php creating zips without path to files inside the zip
                            
                                How to browse a .zip file in IntelliJ (or .jar, etc.)
                            
                                Maven best practice for creating ad hoc zip artifact
                            
                                Appending files to a zip file with Java
                            
                                Zipped Python generators with 2nd one being shorter: how to retrieve element that is silently consumed
                            
                                Create zip file from byte[]
                            
                                how do I zip a whole folder tree in unix, but only certain files?
                            
                                Extracting a zipfile to memory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is .tar.gz still much more common than .tar.xz? [closed]

Tags:

compression

zip

xz

lzma

soc

People also ask

2 Answers

SF.

Jody Bruchon

Recent Activity

Donate For Us

Why is *.tar.gz still much more common than *.tar.xz? [closed]

Tags:

compression

zip

xz

lzma

soc

People also ask

2 Answers

SF.

Jody Bruchon

Related questions

Recent Activity

Donate For Us

Why is .tar.gz still much more common than .tar.xz? [closed]