multi-processor, multi-core machines. pigz, which stands for parallel implementation of gzip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data. pigz was written by Mark Adler, and uses the zlib and pthread libraries.
On many unix like systems, tar is a widely used tool to package and compress files, almost built-in in the all common Linux and BSD distribution, however, tar always spends a lot of time on file compression, because the programs itself doesn't support multi-thread compressing, but fortunately, tar supports to use ...
Compressing and decompressing files with standard compression utilities like gzip is a single-threaded affair. For large files on fast disks, that single thread becomes the bottleneck.
Bzip2 is still one of the most commonly used compression tools in Linux, but it only works with a single thread, and I've been made aware that lbzip2 allows multi-threaded bzip2 compressions which should lead to much better performance on multi-core systems.
You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use.
For example use:
tar -c --use-compress-program=pigz -f tar.file dir_to_zip
You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz:
tar cf - paths-to-archive | pigz > archive.tar.gz
By default, pigz uses the number of available cores, or eight if it could not query that. You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can request better compression with -9. E.g.
tar cf - paths-to-archive | pigz -9 -p 32 > archive.tar.gz
There is option for tar
program:
-I, --use-compress-program PROG
filter through PROG (must accept -d)
You can use multithread version of archiver or compressor utility.
Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:
$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive
Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need specify additional parameters, then use pipes (add parameters if necessary):
$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz
Input and output of singlethread and multithread are compatible. You can compress using multithread version and decompress using singlethread version and vice versa.
For p7zip for compression you need a small shell script like the following:
#!/bin/sh
case $1 in
-d) 7za -txz -si -so e;;
*) 7za -txz -si -so a .;;
esac 2>/dev/null
Save it as 7zhelper.sh. Here the example of usage:
$ tar -I 7zhelper.sh -cf OUTPUT_FILE.tar.7z paths_to_archive
$ tar -I 7zhelper.sh -xf OUTPUT_FILE.tar.7z
Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils, you can utilize multiple cores for compression by setting -T
or --threads
to an appropriate value via the environmental variable XZ_DEFAULTS (e.g. XZ_DEFAULTS="-T 0"
).
This is a fragment of man for 5.1.0alpha version:
Multithreaded compression and decompression are not implemented yet, so this option has no effect for now.
However this will not work for decompression of files that haven't also been compressed with threading enabled. From man for version 5.2.2:
Threaded decompression hasn't been implemented yet. It will only work on files that contain multiple blocks with size information in block headers. All files compressed in multi-threaded mode meet this condition, but files compressed in single-threaded mode don't even if --block-size=size is used.
If you build tar from sources, then you can recompile with parameters
--with-gzip=pigz
--with-bzip2=lbzip2
--with-lzip=plzip
After recompiling tar with these options you can check the output of tar's help:
$ tar --help | grep "lbzip2\|plzip\|pigz"
-j, --bzip2 filter the archive through lbzip2
--lzip filter the archive through plzip
-z, --gzip, --gunzip, --ungzip filter the archive through pigz
You can use the shortcut -I
for tar's --use-compress-program
switch, and invoke pbzip2
for bzip2 compression on multiple cores:
tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 DIRECTORY_TO_COMPRESS/
If you want to have more flexibility with filenames and compression options, you can use:
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec \
tar -P --transform='s@/my/path/@@g' -cf - {} + | \
pigz -9 -p 4 > myarchive.tar.gz
find
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec
This command will look for the files you want to archive, in this case /my/path/*.sql
and /my/path/*.log
. Add as many -o -name "pattern"
as you want.
-exec
will execute the next command using the results of find
: tar
tar
tar -P --transform='s@/my/path/@@g' -cf - {} +
--transform
is a simple string replacement parameter. It will strip the path of the files from the archive so the tarball's root becomes the current directory when extracting. Note that you can't use -C
option to change directory as you'll lose benefits of find
: all files of the directory would be included.
-P
tells tar
to use absolute paths, so it doesn't trigger the warning "Removing leading `/' from member names". Leading '/' with be removed by --transform
anyway.
-cf -
tells tar
to use the tarball name we'll specify later
{} +
uses everyfiles that find
found previously
pigz
pigz -9 -p 4
Use as many parameters as you want.
In this case -9
is the compression level and -p 4
is the number of cores dedicated to compression.
If you run this on a heavy loaded webserver, you probably don't want to use all available cores.
> myarchive.tar.gz
Finally.
A relatively newer (de)compression tool you might want to consider is zstandard. It does an excellent job of utilizing spare cores, and it has made some great trade-offs when it comes to compression ratio vs. (de)compression time. It is also highly tweak-able depending on your compression ratio needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With