Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decompress lzma2 (.xz) and zstd (.zst) files into a folder using Python 3?

I have been working for a long time with .bz2 files. To unpack/decompress .bz2 files into a specific folder I have been using the following function:

destination_folder = 'unpacked/'
def decompress_bz2_to_folder(input_file):
    unpackedfile = bz2.BZ2File(input_file)
    data = unpackedfile.read()
    open(destination_folder, 'wb').write(data)

Recently I obtained a list of files with the .xz (not .tar.xz) and .zst extensions. My poor research skills told me that the former is lzma2 compression and the latter is Zstandard.

However, I couldn't find of an easy way to unpack the contents of these archives into a folder (like I do with the .bz2 files).

How can I:

  1. Unpack the contents of an .xz (lzma2) file into a folder using Python 3?
  2. Unpack the contents of a .zst (Zstandard) file into a folder using Python 3?

Important Note: I'm unpacking very large files, so it would be great if the solution takes into consideration any potential Memory Errors.

like image 319
Aventinus Avatar asked Mar 15 '19 13:03

Aventinus


1 Answers

The LZMA data can be decompressed using the lzma module, simply open the file with that module, then use shutil.copyfileobj() to efficiently copy the decompressed data to an output file without running into memory issues:

import lzma
import pathlib
import shutil

def decompress_lzma_to_folder(input_file):
    input_file = pathlib.Path(input_file)
    with lzma.open(input_file) as compressed:
        output_path = pathlib.Path(destination_dir) / input_file.stem
        with open(output_path, 'wb') as destination:
            shutil.copyfileobj(compressed, destination)
        

The Python standard library doesn't have any support for Zstandard compression yet, you can use either the zstandard (by IndyGreg from Mozilla and the Mercurial project) or zstd; the latter is perhaps too basic for your needs, while zstandard offers a streaming API specifically suited for reading files.

I'm using the zstandard library here to benefit from the copying API it implements, which lets you decompress and copy at the same time, similar to how shutil.copyfileobj() works:

import zstandard
import pathlib

def decompress_zstandard_to_folder(input_file):
    input_file = pathlib.Path(input_file)
    with open(input_file, 'rb') as compressed:
        decomp = zstandard.ZstdDecompressor()
        output_path = pathlib.Path(destination_dir) / input_file.stem
        with open(output_path, 'wb') as destination:
            decomp.copy_stream(compressed, destination)
like image 87
Martijn Pieters Avatar answered Oct 19 '22 09:10

Martijn Pieters