Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is `pandas.read_csv` not the reciprocal of `pandas.DataFrame.to_csv`?

It seems strange to me that pandas.read_csv is not a direct reciprocal function to df.to_csv. In this illustration, notice how when using all the default settings the original and final DataFrames differ by the "Unnamed" column.

In [1]: import pandas as pd

In [2]: orig_df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); orig_df
Out[2]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

[4 rows x 3 columns]

In [3]: orig_df.to_csv('test.csv')

In [4]: final_df = pd.read_csv('test.csv'); final_df
Out[4]: 
   Unnamed: 0  AAA  BBB  CCC
0           0    4   10  100
1           1    5   20   50
2           2    6   30  -30
3           3    7   40  -50

[4 rows x 4 columns]

It seems the default read_csv should instead be

In [6]: final2_df = pd.read_csv('test.csv', index_col=0); final2_df
Out[7]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

[4 rows x 3 columns]

or the default to_csv should instead be

In [8]: df.to_csv('test2.csv', index=False)

which when read gives

In [9]: pd.read_csv('test2.csv')
Out[9]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

[4 rows x 3 columns]

(Perhaps this should instead be sent to the developer/s but I am genuinely interested why this is the default behavior. Hopefully it also can help someone else avoid the confusion I had).

like image 369
Steven C. Howell Avatar asked Jul 24 '15 22:07

Steven C. Howell


People also ask

What does To_csv do in pandas?

Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file. Otherwise, the CSV data is returned in the string format.

Does pandas To_csv overwrite?

If the file already exists, it will be overwritten. If no path is given, then the Frame will be serialized into a string, and that string will be returned.

What is the difference between Read_table and read_csv in pandas?

The difference between read_csv() and read_table() is almost nothing. In fact, the same function is called by the source: read_csv() delimiter is a comma character. read_table() is a delimiter of tab \t .

Is read_csv a DataFrame?

read_csv is used to load a CSV file as a pandas dataframe. In this article, you will learn the different features of the read_csv function of pandas apart from loading the CSV file and the parameters which can be customized to get better output from the read_csv function.


1 Answers

Thanks for the tip to post to the github page @EdChum. This led me to the pandas.DataFrame.from_csv function which is indeed the reciprocal of pandas.DataFrame.to_csv.

In [6]: final_df = pd.DataFrame.from_csv('test.csv')

In [7]: final_df
Out[7]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50

[4 rows x 3 columns]
like image 112
Steven C. Howell Avatar answered Oct 21 '22 12:10

Steven C. Howell