Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you unzip very large files in python?

Tags:

Using python 2.4 and the built-in ZipFile library, I cannot read very large zip files (greater than 1 or 2 GB) because it wants to store the entire contents of the uncompressed file in memory. Is there another way to do this (either with a third-party library or some other hack), or must I "shell out" and unzip it that way (which isn't as cross-platform, obviously).

like image 587
Marc Novakowski Avatar asked Dec 03 '08 23:12

Marc Novakowski


People also ask

How do I unzip a file larger than 4GB?

If any single file in your zip file is over 4GB, then a 64-bit unarchiving program is required to open the . zip file, otherwise you will get a loop and be unable to extract the files. Learn more about 32-bit vs 64-bit.

How do I unzip a large zip file?

To unzip files Open File Explorer and find the zipped folder. To unzip the entire folder, right-click to select Extract All, and then follow the instructions. To unzip a single file or folder, double-click the zipped folder to open it. Then, drag or copy the item from the zipped folder to a new location.

How do I unzip a zip file in Python?

Import the zipfile module Create a zip file object using ZipFile class. Call the extract() method on the zip file object and pass the name of the file to be extracted and the path where the file needed to be extracted and Extracting the specific file present in the zip.

How do I unzip a file faster in Python?

One approach would be to call the ZipFile. extract() function directly to first decompress the data into memory then save the data to disk. An alternate approach might be to first decompress the data into memory as a string using the ZipFile.


1 Answers

Here's an outline of decompression of large files.

import zipfile
import zlib
import os

src = open( doc, "rb" )
zf = zipfile.ZipFile( src )
for m in  zf.infolist():

    # Examine the header
    print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment)
    src.seek( m.header_offset )
    src.read( 30 ) # Good to use struct to unpack this.
    nm= src.read( len(m.filename) )
    if len(m.extra) > 0: ex= src.read( len(m.extra) )
    if len(m.comment) > 0: cm= src.read( len(m.comment) ) 

    # Build a decompression object
    decomp= zlib.decompressobj(-15)

    # This can be done with a loop reading blocks
    out= open( m.filename, "wb" )
    result= decomp.decompress( src.read( m.compress_size ) )
    out.write( result )
    result = decomp.flush()
    out.write( result )
    # end of the loop
    out.close()

zf.close()
src.close()
like image 144
S.Lott Avatar answered Oct 13 '22 06:10

S.Lott