How can I extract the size of the total uncompressed file data in a .tar.gz file from command line?
For some applications it is useful to determine the uncompressed size of a file that has been compressed by the gzip algorithm. From the command line this can be done by using the -l option of the gzip program.
If you need to check the contents of a compressed text file on Linux, you don't have to uncompress it first. Instead, you can use a zcat or bzcat command to extract and display file contents while leaving the file intact. The "cat" in each command name tells you that the command's purpose is to display content.
This works for any file size:
zcat archive.tar.gz | wc -c
For files smaller than 4Gb you could also use the -l option with gzip:
$ gzip -l compressed.tar.gz
compressed uncompressed ratio uncompressed_name
132 10240 99.1% compressed.tar
This will sum the total content size of the extracted files:
$ tar tzvf archive.tar.gz | sed 's/ \+/ /g' | cut -f3 -d' ' | sed '2,$s/^/+ /' | paste -sd' ' | bc
The output is given in bytes.
Explanation: tar tzvf
lists the files in the archive in verbose format like ls -l
. sed
and cut
isolate the file size field. The second sed
puts a + in front of every size except the first and paste
concatenates them, giving a sum expression that is then evaluated by bc
.
Note that this doesn't include metadata, so the disk space taken up by the files when you extract them is going to be larger - potentially many times larger if you have a lot of very small files.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With