Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compress a large file in Python?

The problem I'm experiencing is the name of the stored file. The stored file isn't named with the original/uncompressed file name. Instead the stored file is named with the archive name (with the appended ".gz" extension).

Expected Result:

file.txt.gz           {archive name}
....file.txt          {stored file name}

Actual Result:

file.txt.gz           {archive name}
....file.txt.gz       {stored file name}

Reading through the gzip documentation (https://docs.python.org/2.7/library/gzip.html) example code:

import gzip
import shutil
with open('file.txt', 'rb') as f_in, gzip.open('file.txt.gz', 'wb') as f_out:
    shutil.copyfileobj(f_in, f_out)

How do I get the archive to store the file with the name "file.txt" instead of "file.txt.gz"?

like image 747
Jake Avatar asked Jun 24 '16 16:06

Jake


People also ask

How do I highly compress files in Python?

To create your own compressed ZIP files, you must open the ZipFile object in write mode by passing 'w' as the second argument. When you pass a path to the write() method of a ZipFile object, Python will compress the file at that path and add it into the ZIP file.

How do you compress a file that is too large?

You can make a large file a little smaller by compressing it into a zipped folder. In Windows, right-click the file or folder, go down to “send to,” and choose “Compressed (zipped) folder.” This will create a new folder that's smaller than the original.

How do I compress code in Python?

Algorithm for string compression in pythonPick the first character from the input string ( str ). Append it to the compressed string. Count the number of subsequent occurrences of the character (in str) and append the count to the compressed string if it is more than 1 only​.


1 Answers

You have to use gzip.GzipFile(); the shorthand gzip.open() won't do what you want.

Quoth the doc:

When fileobj is not None, the filename argument is only used to be included in the gzip file header, which may include the original filename of the uncompressed file. It defaults to the filename of fileobj, if discernible; otherwise, it defaults to the empty string, and in this case the original filename is not included in the header.

Try this:

import gzip
import shutil
with open('file.txt', 'rb') as f_in:
    with open('file.txt.gz', 'wb') as f_out:
        with gzip.GzipFile('file.txt', 'wb', fileobj=f_out) as f_out:
            shutil.copyfileobj(f_in, f_out)
like image 121
Robᵩ Avatar answered Sep 23 '22 17:09

Robᵩ