How to tell if a file is gzip compressed?

People also ask

How can I tell if a file is compressed?

You can check the extension. If you don't trust the extension, then you have to look into the file and check for signatures. You can find some of them here. The call to stat will not tell you about individual files being compressed, as this flag means that the file system is compressed.

How do I test a gzip file?

If you just want to test whether or not the file is compressed use gzip --list (redirect errors if you want) and check $? The gzip -t command only returns an exit code to the shell saying whether the file passed the integrity test or not.

Is a file a gzip?

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU (the "g" is from "GNU").

The magic number for gzip compressed files is 1f 8b. Although testing for this is not 100% reliable, it is highly unlikely that "ordinary text files" start with those two bytes—in UTF-8 it's not even legal.

Usually gzip compressed files sport the suffix .gz though. Even gzip(1) itself won't unpack files without it unless you --force it to. You could conceivably use that, but you'd still have to deal with a possible IOError (which you have to in any case).

One problem with your approach is, that gzip.GzipFile() will not throw an exception if you feed it an uncompressed file. Only a later read() will. This means, that you would probably have to implement some of your program logic twice. Ugly.

Is there a cross-platform, usable from Python way to determine if a file is gzip compressed or not?

The accepted answer explains how one can detect a gzip compressed file in general: test if the first two bytes are 1f 8b. However it does not show how to implement it in Python.

Here is one way:

def is_gz_file(filepath):
    with open(filepath, 'rb') as test_f:
        return test_f.read(2) == b'\x1f\x8b'

Testing the magic number of a gzip file is the only reliable way to go. However, as of python3.7 there is no need to mess with comparing bytes yourself anymore. The gzip module will compare the bytes for you and raise an exception if they do not match!

As of python3.7, this works

import gzip
with gzip.open(input_file, 'r') as fh:
    try:
        fh.read(1)
    except OSError:
        print('input_file is not a valid gzip file by OSError')

As of python3.8, this also works:

import gzip
with gzip.open(input_file, 'r') as fh:
    try:
        fh.read(1)
    except gzip.BadGzipFile:
        print('input_file is not a valid gzip file by BadGzipFile')

gzip itself will raise an OSError if it's not a gzipped file.

>>> with gzip.open('README.md', 'rb') as f:
...     f.read()
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/Users/dennis/.asdf/installs/python/3.6.6/lib/python3.6/gzip.py", line 276, in read
    return self._buffer.read(size)
  File "/Users/dennis/.asdf/installs/python/3.6.6/lib/python3.6/gzip.py", line 463, in read
    if not self._read_gzip_header():
  File "/Users/dennis/.asdf/installs/python/3.6.6/lib/python3.6/gzip.py", line 411, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'# ')

Can combine this approach with some others to increase confidence, such as checking the mimetype or looking for a magic number in the file header (see other answers for an example) and checking the extension.

import pathlib

if '.gz' in pathlib.Path(filepath).suffixes:
   # some more inexpensive checks until confident we can attempt to decompress
   # ...
   try ...
     ...
   except OSError as e:
     ...

Related questions
                            
                                Sorting 2D list python [closed]
                            
                                Scipy.optimize: how to restrict argument values
                            
                                Http Redirection code 3XX in python requests
                            
                                Flask blueprint static directory does not work?
                            
                                Multiple histograms in Pandas
                            
                                What does this notation do for lists in Python: "someList[:]"?
                            
                                Django Rest Framework Read Only Model Serializer
                            
                                Python conversion from JSON to JSONL
                            
                                How do I use python for web development without relying on a framework?
                            
                                SQLAlchemy: filter by membership in at least one many-to-many related table
                            
                                Python DNS module import error
                            
                                Matrix inversion without Numpy
                            
                                Plot dynamically changing graph using matplotlib in Jupyter Notebook
                            
                                How to set all the values of an existing Pandas DataFrame to zero?
                            
                                Change process priority in Python, cross-platform
                            
                                Django InlineModelAdmin: Show partially an inline model and link to the complete model
                            
                                Catching all exceptions in Python
                            
                                Sort list of strings ignoring upper/lower case
                            
                                Twitter API - get tweets with specific id
                            
                                Slice notation in Scala?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to tell if a file is gzip compressed?

Tags:

python

compression

gzip

People also ask

Recent Activity

Donate For Us