Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

seek() a file within a zip file in Python without passing it to memory

is there anyway to make a file inside a zip file seekable in Python without reading it to memory?

I tried the obvious procedure but I get an error since the file is not seekable:

In [74]: inputZipFile = zipfile.ZipFile("linear_g_LAN2A_F_3keV_1MeV_30_small.zip", 'r')

In [76]: inputCSVFile = inputZipFile.open(inputZipFile.namelist()[0], 'r')   

In [77]: inputCSVFile
Out[77]: <zipfile.ZipExtFile at 0x102f5fad0>

In [78]: inputCSVFile.se
inputCSVFile.seek      inputCSVFile.seekable  

In [78]: inputCSVFile.seek(0)
---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-78-f1f9795b3d55> in <module>()
----> 1 inputCSVFile.seek(0)

UnsupportedOperation: seek
like image 486
jbssm Avatar asked Oct 10 '12 14:10

jbssm


People also ask

How do I extract text from a ZIP file in Python?

extractall() method will extract all the contents of the zip file to the current working directory. You can also call extract() method to extract any file by specifying its path in the zip file. This will extract only the specified file.

How do I read a ZIP file in Python?

import zipfile archive = zipfile. ZipFile('images. zip', 'r') imgdata = archive. read('img_01.

What does ZIP file ZIP file do?

The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file. Any advanced use of this module will require an understanding of the format, as defined in PKZIP Application Note.

How do I read a ZIP file in pandas?

Method #1: Using compression=zip in pandas. read_csv() method. By assigning the compression argument in read_csv() method as zip, then pandas will first decompress the zip and then will create the dataframe from CSV file present in the zipped file.


2 Answers

There is no way to do so for all zip files. DEFLATE is a stream compression algorithm, which means that there is no way to decompress arbitrary parts of the file without having decompressed everything before it. It could possibly be implemented for files that have been stored, but then you get in the unfavorable position where some entries are seekable and others aren't.

like image 111
Ignacio Vazquez-Abrams Avatar answered Oct 21 '22 05:10

Ignacio Vazquez-Abrams


ZipExtFile is now seekable :

https://bugs.python.org/issue22908 https://github.com/python/cpython/commit/066df4fd454d6ff9be66e80b2a65995b10af174f

like image 22
The_Pingu Avatar answered Oct 21 '22 03:10

The_Pingu