Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to open and read LZMA file in-memory

I have a giant file, let's call it one-csv-file.xz. It is an XZ-compressed CSV file.

How can I open and parse through the file without first decompressing it to disk? What if the file is, for example, 100 GB? Python cannot read all of that into memory at once, of course. Will it page or run out of memory?

like image 948
Totes McGoats Avatar asked Feb 22 '15 02:02

Totes McGoats


People also ask

How do I open a LZMA file?

How to Open an LZMA File. PeaZip and 7-Zip are two free programs for Windows and Linux that can decompress (extract) the contents of an LZMA file. The Unarchiver can open LZMA files on a Mac, and B1 Free Archiver is a similar LZMA file opener for Windows, Linux, macOS, and Android.

How do I read a LZMA file in Python?

Reading and writing compressed files. Open an LZMA-compressed file in binary or text mode, returning a file object. The filename argument can be either an actual file name (given as a str , bytes or path-like object), in which case the named file is opened, or it can be an existing file object to read from or write to.

What is an LZMA file?

A file with . lzma extension is a compressed archive file created using the LZMA (Lempel-Ziv-Markov chain Algorithm) compression method. These are mainly found/used on Unix operating system and are similar to other compression algorithms such as ZIP for minimising file size.


1 Answers

You can iterate through an LZMAFile object

import lzma  # python 3, try lzmaffi in python 2
with open('one-csv-file.xz') as compressed:
    with lzma.LZMAFile(compressed) as uncompressed:
        for line in uncompressed:
            do_stuff_with(line)
like image 151
MRocklin Avatar answered Oct 05 '22 19:10

MRocklin