How to unzip multiple gz files in python using multi threading?

Tags:

I have multiple gz files with a total size of around 120GB. I want to unzip(gzip) those files to the same directory and remove the existing gz file. Currently we are doing it manually and it is taking more time to unzip using gzip -d <filename>.
Is there a way I can unzip those files in parallel by creating a python script or any other technique. Currently these files are on a Linux machine.

231

asked Dec 24 '15 10:12

Satheesh

1 Answers

You can do this very easily with multiprocessing Pools:

import gzip
import multiprocessing
import shutil

filenames = [
    'a.gz',
    'b.gz',
    'c.gz',
    ...
]

def uncompress(path):
    with gzip.open(path, 'rb') as src, open(path.rstrip('.gz'), 'wb') as dest:
        shutil.copyfileobj(src, dest)

with multiprocessing.Pool() as pool:
    for _ in pool.imap_unordered(uncompress, filenames, chunksize=1):
        pass

This code will spawn a few processes, and each process will extract one file at a time.

Here I've chosen chunksize=1, to avoid stalling processes if some files are bigger than average.

answered Sep 28 '22 06:09

Andrea Corbellini

Related questions
                            
                                Pandas groupby to find percent True and False
                            
                                Is using "type" as an attribute name a bad practice?
                            
                                Numpy: argmax over multiple axes without loop
                            
                                json.dump() gives me "TypeError: keys must be a string"
                            
                                Automatically convert JSON to Object on Flask Request
                            
                                Getting the selected value from combobox in Tkinter
                            
                                How to create a Pure-Python wheel
                            
                                Decimal.quantize raises InvalidOperation
                            
                                Python regex '\s' does not match unicode BOM (U+FEFF)
                            
                                Pandas histogram Labels and Title
                            
                                Class-based views in aiohttp
                            
                                Jupyter refuses to serve hidden directory (D:\) on windows
                            
                                Python Opencv morphological closing gives src data type = 0 is not supported
                            
                                Tuples vs lists for module-level constants in Python?
                            
                                Installing bsddb3 6.1.1 in Windows: FileNotFoundError: 'db/include\\db.h'
                            
                                Error: type object 'Keys' has no attribute 'chord'
                            
                                xlswriter formatting a range
                            
                                ldap3 python search members of a group and retrieve their sAMAcountName (Active Directory)
                            
                                How to get the output from os.system()? [duplicate]
                            
                                How can I extend a library's decorator?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to unzip multiple gz files in python using multi threading?

Tags:

python

linux

multithreading

gzip

Satheesh

People also ask

1 Answers

Andrea Corbellini

Recent Activity

Donate For Us