I have a text file of 25GB. so i compressed it to tar.gz and it became 450 MB. now i want to read that file from python and process the text data.for this i referred question . but in my case code doesn't work. the code is as follows :
import tarfile import numpy as np tar = tarfile.open("filename.tar.gz", "r:gz") for member in tar.getmembers(): f=tar.extractfile(member) content = f.read() Data = np.loadtxt(content)
the error is as follows :
Traceback (most recent call last): File "dataExtPlot.py", line 21, in <module> content = f.read() AttributeError: 'NoneType' object has no attribute 'read'
also, Is there any other method to do this task ?
In order to extract or un-compress “. tar. gz” files using python, we have to use the tarfile module in python. This module can read and write .
open() This function opens a gzip-compressed file in binary or text mode and returns a file like object, which may be physical file, a string or byte object. By default, the file is opened in 'rb' mode i.e. reading binary data, however, the mode parameter to this function can take other modes as listed below.
You can use the tarfile module to read and write tar files. To extract a tar file, you need to first open the file and then use the extract method of the tarfile module.
The docs tell us that None is returned by extractfile() if the member is a not a regular file or link.
One possible solution is to skip over the None results:
tar = tarfile.open("filename.tar.gz", "r:gz") for member in tar.getmembers(): f = tar.extractfile(member) if f is not None: content = f.read()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With