I have a directory I’m archiving:
$ du -sh oldcode 1400848 $ tar cf oldcode.tar oldcode
So the directory is 1.4gb. The file is significantly smaller, though:
$ ls -l oldcode.tar -rw-r--r-- 1 ieure ieure 940339200 2002-01-30 10:33 oldcode.tar
Only 897mb. It’s not compressed in any way:
$ file oldcode.tar oldcode.tar: POSIX tar archive
Why is the tar file smaller than its contents?
The advantages of tar: Tar, when it comes to compression has a compression ratio of 50%, which means it compresses efficiently. Drastically reduces the size of packaged files and folders. Tar does not alter the features of files and directories.
A tar file has overhead because it also includes information on how to recreate the files. If the content you added to the tar file itself is already compressed you can end up with a bigger file than all the Mibs of all the files together.
Compressing a tar archive typically saves 50 percent or more.
There are varying reports to the maximum file-size that tar can support, but it appears that it was originally 2GB, later extended to 8GB, and most recently extended to 68 GB.
You get a difference because of the way the filesystem works.
In a nutshell your disk is made out of clusters. Each cluster has a fixed size of - let's say - 4 kilobytes. If you store a 1kb file in such a cluster 3kb will be unused. The exact details vary with the kind of file-system that you use, but most file-systems work that way.
3kb wasted space is not much for a single file, but if you have lots of very small files the waste can become a significant part of the disk usage.
Inside the tar-archive the files are not stored in clusters but one after another. That's where the difference comes from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With