Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiprocessor support for `xz`?

Tags:

xz

Is there a way to spread xz compression efforts across multiple CPU's? I realize that this doesn't appear possible with xz itself, but are there other utilities that implement the same compression algorithm that would allow more efficient processor utilization? I will be running this in scripts and utility apps on systems with 16+ processors and it would be useful to at least use 4-8 processors to potentially speed up compression rates.

like image 277
ylluminate Avatar asked Mar 07 '14 08:03

ylluminate


People also ask

Is gzip multithreaded?

For example, gzip and bzip2 are each specific categories. The tools mentioned here are capable of parallel (multithreaded) compression with the gzip file format. This means the serial gzip is fully capable of decompressing files compressed with the multithreaded tools.

Can Python use multiple CPU cores?

Key Takeaways. Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

Can tar be multithreaded?

On many unix like systems, tar is a widely used tool to package and compress files, almost built-in in the all common Linux and BSD distribution, however, tar always spends a lot of time on file compression, because the programs itself doesn't support multi-thread compressing, but fortunately, tar supports to use ...

Does multiprocessing use multiple cores?

Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel. This parallelization leads to significant speedup in tasks that involve a lot of computation.


Video Answer


1 Answers

Multiprocessor (multithreading) compression support was added to xz in version 5.2, in December 2014.

To enable the functionality, add the -T option, along with either the number of worker threads to spawn, or -T0 to spawn as many CPU's as the OS reports:

xz -T0 big.tar xz -T4 bigish.tar 

The default single threaded operation is equivalent to -T1.

I have found that running it with a couple of hyper-threads less than the total number of hyperthreads on my CPU provides a good balance of responsiveness and compression speed.

† So -T10 on my 6 core, 12 thread workstation.

As scai and Dzenly said in comments

If you want to use this in combination with tar just call export XZ_DEFAULTS="-T 0" before.

or use smth like: XZ_OPT="-2 -T0"

like image 157
Mark Booth Avatar answered Sep 22 '22 09:09

Mark Booth