Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read .tar.gz file in Python

I have a text file of 25GB. so i compressed it to tar.gz and it became 450 MB. now i want to read that file from python and process the text data.for this i referred question . but in my case code doesn't work. the code is as follows :

import tarfile import numpy as np   tar = tarfile.open("filename.tar.gz", "r:gz") for member in tar.getmembers():      f=tar.extractfile(member)      content = f.read()      Data = np.loadtxt(content) 

the error is as follows :

Traceback (most recent call last):   File "dataExtPlot.py", line 21, in <module>     content = f.read() AttributeError: 'NoneType' object has no attribute 'read' 

also, Is there any other method to do this task ?

like image 554
KrunalParmar Avatar asked May 27 '16 04:05

KrunalParmar


People also ask

How do I open a tar GZ file in Python?

In order to extract or un-compress “. tar. gz” files using python, we have to use the tarfile module in python. This module can read and write .

How do I read a .GZ file in Python?

open() This function opens a gzip-compressed file in binary or text mode and returns a file like object, which may be physical file, a string or byte object. By default, the file is opened in 'rb' mode i.e. reading binary data, however, the mode parameter to this function can take other modes as listed below.

How do I read a tar file in Python?

You can use the tarfile module to read and write tar files. To extract a tar file, you need to first open the file and then use the extract method of the tarfile module.


1 Answers

The docs tell us that None is returned by extractfile() if the member is a not a regular file or link.

One possible solution is to skip over the None results:

tar = tarfile.open("filename.tar.gz", "r:gz") for member in tar.getmembers():      f = tar.extractfile(member)      if f is not None:          content = f.read() 
like image 54
Raymond Hettinger Avatar answered Sep 20 '22 13:09

Raymond Hettinger