It seems strange to me that <code>pandas.read_csv</code> is not a direct reciprocal function to <code>df.to_csv</code>. In this illustration, notice how when using all the default settings the original and final DataFrames differ by the "Unnamed" column. <pre class="prettyprint"><code>In [1]: import pandas as pd In [2]: orig_df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); orig_df Out[2]: AAA BBB CCC 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 [4 rows x 3 columns] In [3]: orig_df.to_csv('test.csv') In [4]: final_df = pd.read_csv('test.csv'); final_df Out[4]: Unnamed: 0 AAA BBB CCC 0 0 4 10 100 1 1 5 20 50 2 2 6 30 -30 3 3 7 40 -50 [4 rows x 4 columns] </code></pre> It seems the default <code>read_csv</code> should instead be <pre class="prettyprint"><code>In [6]: final2_df = pd.read_csv('test.csv', index_col=0); final2_df Out[7]: AAA BBB CCC 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 [4 rows x 3 columns] </code></pre> or the default <code>to_csv</code> should instead be <pre class="prettyprint"><code>In [8]: df.to_csv('test2.csv', index=False) </code></pre> which when read gives <pre class="prettyprint"><code>In [9]: pd.read_csv('test2.csv') Out[9]: AAA BBB CCC 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 </code></pre> [4 rows x 3 columns] (Perhaps this should instead be sent to the developer/s but I am genuinely interested why this is the default behavior. Hopefully it also can help someone else avoid the confusion I had).

Thanks for the tip to post to the github page @EdChum. This led me to the <code>pandas.DataFrame.from_csv</code> function which is indeed the reciprocal of <code>pandas.DataFrame.to_csv</code>. <pre class="prettyprint"><code>In [6]: final_df = pd.DataFrame.from_csv('test.csv') In [7]: final_df Out[7]: AAA BBB CCC 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 [4 rows x 3 columns] </code></pre>

Why is `pandas.read_csv` not the reciprocal of `pandas.DataFrame.to_csv`?

Tags:

python

pandas

dataframe

It seems strange to me that pandas.read_csv is not a direct reciprocal function to df.to_csv. In this illustration, notice how when using all the default settings the original and final DataFrames differ by the "Unnamed" column.

In [1]: import pandas as pd

In [2]: orig_df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); orig_df
Out[2]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

[4 rows x 3 columns]

In [3]: orig_df.to_csv('test.csv')

In [4]: final_df = pd.read_csv('test.csv'); final_df
Out[4]: 
   Unnamed: 0  AAA  BBB  CCC
0           0    4   10  100
1           1    5   20   50
2           2    6   30  -30
3           3    7   40  -50

[4 rows x 4 columns]

It seems the default read_csv should instead be

In [6]: final2_df = pd.read_csv('test.csv', index_col=0); final2_df
Out[7]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

[4 rows x 3 columns]

or the default to_csv should instead be

In [8]: df.to_csv('test2.csv', index=False)

which when read gives

In [9]: pd.read_csv('test2.csv')
Out[9]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

[4 rows x 3 columns]

(Perhaps this should instead be sent to the developer/s but I am genuinely interested why this is the default behavior. Hopefully it also can help someone else avoid the confusion I had).

369

asked Jul 24 '15 22:07

Steven C. Howell

1 Answers

Thanks for the tip to post to the github page @EdChum. This led me to the pandas.DataFrame.from_csv function which is indeed the reciprocal of pandas.DataFrame.to_csv.

In [6]: final_df = pd.DataFrame.from_csv('test.csv')

In [7]: final_df
Out[7]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

[4 rows x 3 columns]

112

answered Oct 21 '22 12:10

Steven C. Howell

Related questions
                            
                                Ordering and pagination in SQL-alchemy using non-sql ranking
                            
                                Python warnings- how to not print the source line? [duplicate]
                            
                                Prevent PyCharm from showing builtin modules on KeyboardInterrupt and other occasions
                            
                                Low InnoDB Writes per Second - AWS EC2 to MySQL RDS using Python
                            
                                How to distribute files in a Python sdist that are not VCS tracked?
                            
                                Is it possible to prioritise a lock?
                            
                                Unpredictable pandas slice assignment behavior with no SettingWithCopyWarning
                            
                                Executable made with pyInstaller/UPX experiences QtCore4.dll error
                            
                                How to denote return type tuple in Google-style Pydoc for Pycharm?
                            
                                Xgboost: what is the difference among bst.best_score, bst.best_iteration and bst.best_ntree_limit?
                            
                                How to return selenium browser (or how to import a def that return selenium browser)
                            
                                How can I speed up this Keras Attention computation?
                            
                                Why does TensorFlow always use GPU 0?
                            
                                Is double-checked locking thread-safe in Python?
                            
                                what does pip install actually do?
                            
                                Is there a python linter that checks types according to type hints?
                            
                                ast.literal_eval() support for set literals in Python 2.7?
                            
                                Efficient structure for element wise access to very large sparse matrix (Python/Cython)
                            
                                Javascript array with default values (equivalent of Python's defaultdict)? [duplicate]
                            
                                Gtk3 replace child widget with another widget

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is `pandas.read_csv` not the reciprocal of `pandas.DataFrame.to_csv`?

Tags:

python

pandas

dataframe

Steven C. Howell

People also ask

1 Answers

Steven C. Howell

Recent Activity

Donate For Us