Python pandas to_csv zip format

Tags:

I am having a peculiar problem when writing zip files through to_csv.

Using GZIP:

df.to_csv(path_or_buf = 'sample.csv.gz', compression="gzip", index = None, sep = ",", header=True, encoding='utf-8-sig')

gives a neat gzip file with name 'sample.csv.gz' and inside it I get my csv 'sample.csv'

However, things change when using ZIP

df.to_csv(path_or_buf = 'sample.csv.zip', compression="zip", index = None, sep = ",", header=True, encoding='utf-8-sig')

gives a zip file with name 'sample.csv.zip', but inside it the csv has been renamed to 'sample.csv.zip' as well. Removing the extra '.zip' from the file gives the csv back.

How can I implement zip extension without this issue? I need to have zip files as a requirement that I can't bypass. I am using python 2.7 on windows 10 machine.

Thanks in advance for help.

653

asked Mar 13 '19 05:03

Kshitiz

Video Answer

3 Answers

It is pretty straightforward in pandas since version 1.0.0 using dict as compression options:

filename = 'sample'
compression_options = dict(method='zip', archive_name=f'{filename}.csv')
df.to_csv(f'{filename}.zip', compression=compression_options, ...)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

178

answered Oct 20 '22 07:10

Pero

As the thread linked in the comment discusses, ZIP's directory-like nature makes it hard to do what you want without making a lot of assumptions or complicating the arguments for to_csv

If your goal is to write the data directly to a ZIP file, that's harder than you'd think.

If you can bear temporarily writing your data to the filesystem, you can use Python's zipfile module to put that file in a ZIP with the name you preferred, and then delete the file.


import zipfile
import os

df.to_csv('sample.csv',index=None,sep=",",header=True,encoding='utf-8-sig')

with zipfile.ZipFile('sample.zip', 'w') as zf:
    zf.write('sample.csv')
os.remove('sample.csv')

answered Oct 20 '22 08:10

Joe Germuska

Since Pandas 1.0.0 it's possible to set compression using to_csv().

Example in one line:

df.to_csv('sample.zip', compression={'method': 'zip', 'archive_name': 'sample.csv'})

answered Oct 20 '22 07:10

washolive

Related questions
                            
                                Moving back and forth between an on-disk database and a fast in-memory database?
                            
                                Why shouldn't Flask be deployed with the built in server?
                            
                                Open Source based Rules Engines in Java or Python [closed]
                            
                                Acessing POST field data without a form (REST api) using Django
                            
                                Use anaconda environment without activate? (e.g. in Crontab)
                            
                                If we want use S3 to host Python packages, how can we tell pip where to find the newest version?
                            
                                Run function exactly once for each row in a Pandas dataframe
                            
                                How to make two django projects share the same database
                            
                                How can I update pip in PyCharm when I have two versions of python?
                            
                                TCP client/server with sockets, server sending files to clients, client hangs, Python
                            
                                How to complete/close a contour in python opencv?
                            
                                Tensorflow model for OCR
                            
                                Django Rest Framework: How to enable swagger docs for function based views
                            
                                How to set k-Means clustering labels from highest to lowest with Python?
                            
                                Class wise precision and recall for multi class classification in Tensorflow?
                            
                                Is tf.layers.dense a single layer?
                            
                                PyCharm: always mark venv directory as excluded
                            
                                Reading stdout process in real time
                            
                                Reticulate - Running python chunks in Rmarkdown
                            
                                Why should asyncio.StreamWriter.drain be explicitly called?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python pandas to_csv zip format

Tags:

python

pandas

python-2.7