Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivoting a DataFrame in Pandas for output to CSV

This is a simple question for which answer are surprisingly difficult to find online. Here's the situation:

>>> A
[('hey', 'you', 4), ('hey', 'not you', 5), ('not hey', 'you', 2), ('not hey', 'not you', 6)]
>>> A_p = pandas.DataFrame(A)
>>> A_p
         0        1  2
0      hey      you  4
1      hey  not you  5
2  not hey      you  2
3  not hey  not you  6
>>> B_p = A_p.pivot(0,1,2)
>>> B_p
1        not you  you
0                    
hey            5    4
not hey        6    2

This isn't quite what's suggested in the documentation for pivot -- there, it shows results without the 1 and 0 in the upper-left-hand corner. And that's what I'm looking for, a DataFrame object that prints as

         not you  you
hey            5    4
not hey        6    2

The problem is that the normal behavior results in a csv file whose first line is

0,not you,you

when I really want

not you, you

When the normal csv file (with the preceding "0,") reads into R, it doesn't properly set the column and row names from the frame object, resulting in painful manual manipulation to get it in the right format. Is there a way to get pivot to give me a DataFrame object without that additional information in the upper-left corner?

like image 1000
user1467068 Avatar asked Oct 07 '22 11:10

user1467068


1 Answers

Well, you have:

In [17]: B_p.to_csv(sys.stdout)
0,not you,you
hey,5.0,4.0
not hey,6.0,2.0

In [18]: B_p.to_csv(sys.stdout, index=False)
not you,you
5.0,4.0
6.0,2.0

But I assume you want the row names. Setting the index name to None (B_p.index.name = None) gives a leading comma:

In [20]: B_p.to_csv(sys.stdout)
,not you,you
hey,5.0,4.0
not hey,6.0,2.0

This roughly matches (ignoring quoted strings) what R writes in write.csv when row.names=TRUE:

"","a","b"
"foo",0.720538259472741,-0.848304940318957
"bar",-0.64266667412325,-0.442441171401282
"baz",-0.419181615269841,-0.658545964124229
"qux",0.881124313748992,0.36383198969179
"bar2",-1.35613767310069,-0.124014006180608

Any of these help?

EDIT: Added the index_label=False option today which does what you want:

In [2]: df
Out[2]: 
       A  B
one    1  4
two    2  5
three  3  6

In [3]: df.to_csv('foo.csv', index_
index_exp     index_label=  index_name=   

In [3]: df.to_csv('foo.csv', index_name=False)

In [4]: 
11:24 ~/code/pandas  (master)$ R

R version 2.14.0 (2011-10-31)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

re> read.csv('foo.csv')
      A B
one   1 4
two   2 5
three 3 6
like image 182
Wes McKinney Avatar answered Oct 12 '22 10:10

Wes McKinney