Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using GZIP Module with Python

Tags:

I'm trying to use the Python GZIP module to simply uncompress several .gz files in a directory. Note that I do not want to read the files, only uncompress them. After searching this site for a while, I have this code segment, but it does not work:

import gzip import glob import os for file in glob.glob(PATH_TO_FILE + "/*.gz"):     #print file     if os.path.isdir(file) == False:         shutil.copy(file, FILE_DIR)         # uncompress the file         inF = gzip.open(file, 'rb')         s = inF.read()         inF.close() 

the .gz files are in the correct location, and I can print the full path + filename with the print command, but the GZIP module isn't getting executed properly. what am I missing?

like image 479
user3111358 Avatar asked Dec 17 '13 13:12

user3111358


People also ask

Is gzip included in Python?

Python's gzip module is the interface to GZip application. The gzip data compression algorithm itself is based on zlib module. The gzip module contains definition of GzipFile class along with its methods. It also caontains convenience function open(), compress() and decompress().

How do I open a gzip file in Python?

To open a compressed file in text mode, use open() (or wrap your GzipFile with an io. TextIOWrapper ). The compresslevel argument is an integer from 0 to 9 controlling the level of compression; 1 is fastest and produces the least compression, and 9 is slowest and produces the most compression.

How do I compress and uncompress a file in Python?

To create your own compressed ZIP files, you must open the ZipFile object in write mode by passing 'w' as the second argument. When you pass a path to the write() method of a ZipFile object, Python will compress the file at that path and add it into the ZIP file.


2 Answers

If you get no error, the gzip module probably is being executed properly, and the file is already getting decompressed.

The precise definition of "decompressed" varies on context:

I do not want to read the files, only uncompress them

The gzip module doesn't work as a desktop archiving program like 7-zip - you can't "uncompress" a file without "reading" it. Note that "reading" (in programming) usually just means "storing (temporarily) in the computer RAM", not "opening the file in the GUI".

What you probably mean by "uncompress" (as in a desktop archiving program) is more precisely described (in programming) as "read a in-memory stream/buffer from a compressed file, and write it to a new file (and possibly delete the compressed file afterwards)"

inF = gzip.open(file, 'rb') s = inF.read() inF.close() 

With these lines, you're just reading the stream. If you expect a new "uncompressed" file to be created, you just need to write the buffer to a new file:

with open(out_filename, 'wb') as out_file:     out_file.write(s) 

If you're dealing with very large files (larger than the amount of your RAM), you'll need to adopt a different approach. But that is the topic for another question.

like image 135
loopbackbee Avatar answered Sep 19 '22 14:09

loopbackbee


You're decompressing file into s variable, and do nothing with it. You should stop searching stackoverflow and read at least python tutorial. Seriously.

Anyway, there's several thing wrong with your code:

  1. you need is to STORE the unzipped data in s into some file.

  2. there's no need to copy the actual *.gz files. Because in your code, you're unpacking the original gzip file and not the copy.

  3. you're using file, which is a reserved word, as a variable. This is not an error, just a very bad practice.

This should probably do what you wanted:

import gzip import glob import os import os.path  for gzip_path in glob.glob(PATH_TO_FILE + "/*.gz"):     if os.path.isdir(gzip_path) == False:         inF = gzip.open(gzip_path, 'rb')         # uncompress the gzip_path INTO THE 's' variable         s = inF.read()         inF.close()          # get gzip filename (without directories)         gzip_fname = os.path.basename(gzip_path)         # get original filename (remove 3 characters from the end: ".gz")         fname = gzip_fname[:-3]         uncompressed_path = os.path.join(FILE_DIR, fname)          # store uncompressed file data from 's' variable         open(uncompressed_path, 'w').write(s) 
like image 38
Jan Spurny Avatar answered Sep 19 '22 14:09

Jan Spurny