I have a very large zip file that is split into multiple parts as split archives, with a single file within the archive. I do not have enough resources to combine these archives together or extract them (the raw text file is nearly 1TB).
I would like to parse the text file line by line, ideally using something like this:
import zipfile
for zipfilename in filenames:
with zipfile.ZipFile(zipfilename) as z:
with z.open(...) as f:
for line in f:
print line
Is this possible? If so, how can I read the text file:
Thank you in advance for your help.
Reading Large Text Files in Python We can use the file object as an iterator. The iterator will return each line one by one, which can be processed. This will not read the whole file into memory and it's suitable to read large files in Python.
Python's zipfile is a standard library module intended to manipulate ZIP files. This file format is a widely adopted industry standard when it comes to archiving and compressing digital data. You can use it to package together several related files.
I'll take a stab.
If your zip files are the so-called "split archives" according to the Zip file format, you won't be able to read them either with Python's zipfile library nor with the unzip
terminal command.
If, on the other hand, you are dealing with a single zip archive that has been split using the split
command or a similar byte-splitting device, you might be able to extract and read its contents on the fly in Python.
You will have to write a "file-like" custom class that will accept the seek() and read() methods (and possibly others) and perform them on the split chunks.
seek() will need to compute which zip file to read, open it (if it's not the current file still open) and perform a seek() on it using the difference in offsets.
read() will read from the chunk that is currently open, dealing with the End of file condition, which will cause it to open the next chunk and complete the read on it.
After you write and test this class, it will just be a matter of calling the ZipFile constructor passing an instance of your class as the "virtual zip" file object to open.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With