I have a giant file, let's call it one-csv-file.xz. It is an XZ-compressed CSV file.
How can I open and parse through the file without first decompressing it to disk? What if the file is, for example, 100 GB? Python cannot read all of that into memory at once, of course. Will it page or run out of memory?
How to Open an LZMA File. PeaZip and 7-Zip are two free programs for Windows and Linux that can decompress (extract) the contents of an LZMA file. The Unarchiver can open LZMA files on a Mac, and B1 Free Archiver is a similar LZMA file opener for Windows, Linux, macOS, and Android.
Reading and writing compressed files. Open an LZMA-compressed file in binary or text mode, returning a file object. The filename argument can be either an actual file name (given as a str , bytes or path-like object), in which case the named file is opened, or it can be an existing file object to read from or write to.
A file with . lzma extension is a compressed archive file created using the LZMA (Lempel-Ziv-Markov chain Algorithm) compression method. These are mainly found/used on Unix operating system and are similar to other compression algorithms such as ZIP for minimising file size.
You can iterate through an LZMAFile
object
import lzma # python 3, try lzmaffi in python 2
with open('one-csv-file.xz') as compressed:
with lzma.LZMAFile(compressed) as uncompressed:
for line in uncompressed:
do_stuff_with(line)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With