Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to_csv call is prepending a comma

Tags:

python

pandas

csv

I have a data file, apples.csv, that has headers like:

"id","str1","str2","str3","num1","num2"

I read it into a dataframe with pandas:

apples = pd.read_csv('apples.csv',delimiter=",",sep=r"\s+")

Then I do some stuff to it, but ignore that (I have it all commented out, and my overall issues still occurs, so said stuff is irrelevant here).

I then save it out:

apples.to_csv('bananas.csv',columns=["id","str1","str2","str3","num1","num2"])

Now, looking at bananas.csv, its headers are:

,id,str1,str2,str3,num1,num2

No more quotes (which I don't really care about, as it doesn't impact anything in the file), and then that leading comma. The ensuing rows are now with an additional column in there, so it saves out 7 columns. But if I do:

print(len(apples.columns))

Immediately prior to saving, it shows 6 columns...

I am normally in Java/Perl/R, and less experienced with Python and particularly Pandas, so I am not sure if this is "yeah, it just does that" or what the issue is - but I have spent amusingly long trying to figure this out and cannot find it via searching.

How can I get it to not do that prepending of a comma, and maybe as important - why is it doing it?

like image 400
omgponies Avatar asked Jun 02 '15 20:06

omgponies


People also ask

What does To_csv do in pandas?

Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file. Otherwise, the CSV data is returned in the string format.

What does To_csv mean in Python?

The to_csv() function is used to write object to a comma-separated values (csv) file.

Does To_csv overwrite?

If the file already exists, it will be overwritten. If no path is given, then the Frame will be serialized into a string, and that string will be returned.


1 Answers

Set index=False (the default is True hence why you see this output) so that it doesn't save the index values to your csv, see the docs

So this:

df = pd.DataFrame({'a':np.arange(5), 'b':np.arange(5)})
df.to_csv(r'c:\data\t.csv')

results in

,a,b
0,0,0
1,1,1
2,2,2
3,3,3
4,4,4

Whilst this:

df.to_csv(r'c:\data\t.csv', index=False)

results in this:

a,b
0,0
1,1
2,2
3,3
4,4

It's for the situation where you may have some index values you want to save

like image 185
EdChum Avatar answered Oct 18 '22 19:10

EdChum