<p>I am learning python pandas. I see a tutorial which shows two ways to save a pandas dataframe.</p> <ol> <li><p><code>pd.to_csv('sub.csv')</code> and to open <code>pd.read_csv('sub.csv')</code></p></li> <li><p><code>pd.to_pickle('sub.pkl')</code> and to open <code>pd.read_pickle('sub.pkl')</code></p></li> </ol> <p>The tutorial says <code>to_pickle</code> is to save the dataframe to disk. I am confused about this. Because when I use <code>to_csv</code>, I did see a csv file appears in the folder, which I assume is also save to disk right?</p> <p>In general, why we want to save a dataframe using <code>to_pickle</code> rather than save it to csv or txt or other format?</p>

<h3><em>csv</em></h3> <ul> <li>✅human readable</li> <li>✅cross platform</li> <li>⛔slower</li> <li>⛔more disk space</li> <li>⛔doesn't preserve types in some cases</li> </ul> <h3><em>pickle</em></h3> <ul> <li>✅fast saving/loading</li> <li>✅less disk space</li> <li>⛔non human readable</li> <li>⛔python only</li> </ul> <h3>Also take a look at <em>parquet</em> format (<code>to_parquet</code>, <code>read_parquet</code>)</h3> <ul> <li>✅fast saving/loading</li> <li>✅less disk space than <em>pickle</em> </li> <li>✅supported by many platforms</li> <li>⛔non human readable</li> </ul>

<p>Pickle is a serialized way of storing a Pandas dataframe. Basically, you are writing down the exact representation of the dataframe to disk. This means the types of the columns are and the indices are the same. If you simply save a file as <code>csv</code>, you are just storing it as a comma separated list. Depending on your data set, some information will be lost when you load it back up.</p> <p>You can read more about pickle library in python, here.</p>

What is the difference between save a pandas dataframe to pickle and to csv?

Tags:

python

pandas

csv

pickle

I am learning python pandas. I see a tutorial which shows two ways to save a pandas dataframe.

pd.to_csv('sub.csv') and to open pd.read_csv('sub.csv')
pd.to_pickle('sub.pkl') and to open pd.read_pickle('sub.pkl')

The tutorial says to_pickle is to save the dataframe to disk. I am confused about this. Because when I use to_csv, I did see a csv file appears in the folder, which I assume is also save to disk right?

In general, why we want to save a dataframe using to_pickle rather than save it to csv or txt or other format?

810

asked Feb 13 '18 15:02

KevinKim

2 Answers

csv

✅human readable
✅cross platform
⛔slower
⛔more disk space
⛔doesn't preserve types in some cases

pickle

✅fast saving/loading
✅less disk space
⛔non human readable
⛔python only

Also take a look at parquet format (`to_parquet`, `read_parquet`)

✅fast saving/loading
✅less disk space than pickle
✅supported by many platforms
⛔non human readable

answered Sep 21 '22 19:09

artoby

Pickle is a serialized way of storing a Pandas dataframe. Basically, you are writing down the exact representation of the dataframe to disk. This means the types of the columns are and the indices are the same. If you simply save a file as csv, you are just storing it as a comma separated list. Depending on your data set, some information will be lost when you load it back up.

You can read more about pickle library in python, here.

answered Sep 21 '22 19:09

Gabriel A

Related questions
                            
                                generalised insert into sqlalchemy using dictionary
                            
                                Python Flask Render Text from Variable like render_template
                            
                                How to turn a pandas dataframe row into a comma separated string
                            
                                How to do waffle charts in python? (square piechart)
                            
                                Finding the mean and standard deviation of a timedelta object in pandas df
                            
                                How to update manytomany field in Django?
                            
                                Django - Working with multiple forms
                            
                                python all possible pairs of 2 list elements, and getting the index of that pair
                            
                                How to convert decimal to binary list in python [duplicate]
                            
                                How to print the LDA topics models from gensim? Python
                            
                                tkinter listbox get(ACTIVE) method
                            
                                calling child class method from parent class file in python
                            
                                check version of pip packages available before installing [duplicate]
                            
                                lxml.etree.XML ValueError for Unicode string
                            
                                filter/select rows of pandas dataframe by timestamp column
                            
                                django user logged out after password change
                            
                                How are finite automata implemented in code?
                            
                                Convert a column of datetimes to epoch in Python
                            
                                How do I generate a python timestamp to a particular format?
                            
                                Tensorflow import error: No module named 'tensorflow'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between save a pandas dataframe to pickle and to csv?

Tags:

python

pandas

csv

pickle

KevinKim

People also ask

2 Answers

csv

pickle

Also take a look at parquet format (`to_parquet`, `read_parquet`)

artoby

Gabriel A

Recent Activity

Donate For Us

What is the difference between save a pandas dataframe to pickle and to csv?

Tags:

python

pandas

csv

pickle

KevinKim

People also ask

2 Answers

csv

pickle

Also take a look at parquet format (to_parquet, read_parquet)

artoby

Gabriel A

Related questions

Recent Activity

Donate For Us

Also take a look at parquet format (`to_parquet`, `read_parquet`)