I am having a peculiar problem when writing zip files through to_csv.
Using GZIP:
df.to_csv(path_or_buf = 'sample.csv.gz', compression="gzip", index = None, sep = ",", header=True, encoding='utf-8-sig')
gives a neat gzip file with name 'sample.csv.gz' and inside it I get my csv 'sample.csv'
However, things change when using ZIP
df.to_csv(path_or_buf = 'sample.csv.zip', compression="zip", index = None, sep = ",", header=True, encoding='utf-8-sig')
gives a zip file with name 'sample.csv.zip', but inside it the csv has been renamed to 'sample.csv.zip' as well. Removing the extra '.zip' from the file gives the csv back.
How can I implement zip extension without this issue? I need to have zip files as a requirement that I can't bypass. I am using python 2.7 on windows 10 machine.
Thanks in advance for help.
Yes you can. If you want to read a zipped or a tar. gz file into pandas dataframe, the read_csv methods includes this particular implementation.
One of the way to create Pandas DataFrame is by using zip() function. You can use the lists to create lists of tuples and create a dictionary from it. Then, this dictionary can be used to construct a dataframe. zip() function creates the objects and that can be used to produce single item at a time.
Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file. Otherwise, the CSV data is returned in the string format.
The to_csv() function is used to write object to a comma-separated values (csv) file.
It is pretty straightforward in pandas since version 1.0.0 using dict as compression options:
filename = 'sample'
compression_options = dict(method='zip', archive_name=f'{filename}.csv')
df.to_csv(f'{filename}.zip', compression=compression_options, ...)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
As the thread linked in the comment discusses, ZIP's directory-like nature makes it hard to do what you want without making a lot of assumptions or complicating the arguments for to_csv
If your goal is to write the data directly to a ZIP file, that's harder than you'd think.
If you can bear temporarily writing your data to the filesystem, you can use Python's zipfile
module to put that file in a ZIP with the name you preferred, and then delete the file.
import zipfile
import os
df.to_csv('sample.csv',index=None,sep=",",header=True,encoding='utf-8-sig')
with zipfile.ZipFile('sample.zip', 'w') as zf:
zf.write('sample.csv')
os.remove('sample.csv')
Since Pandas 1.0.0 it's possible to set compression using to_csv()
.
Example in one line:
df.to_csv('sample.zip', compression={'method': 'zip', 'archive_name': 'sample.csv'})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With