Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas pivoting a dataframe, duplicate rows [duplicate]

I'm having a little trouble with pivoting in pandas. The dataframe (dates, location, data) I'm working on looks like:

dates    location    data
date1       A         X
date2       A         Y
date3       A         Z
date1       B         XX
date2       B         YY

Basically, I'm trying to pivot on location to end up with a dataframe like:

dates   A    B    C
date1   X    XX   etc...
date2   Y    YY
date3   Z    ZZ 

Unfortunately when I pivot, the index, which is equivalent to the original dates column, does not change and I get:

dates  A   B   C
date1  X   NA  etc...
date2  Y   NA
date3  Z   NA
date1  NA  XX
date2  NA  YY

Does anyone know how I can fix this issue to get the dataframe formate I'm looking for?

I'm current calling Pivot as such:

df.pivot(index="dates", columns="location")

because I have a # of data columns I want to pivot (don't want to list each one as an argument). I believe by default pivot pivots the rest of the columns in the dataframe. Thanks.

like image 800
tomas Avatar asked Feb 20 '23 22:02

tomas


1 Answers

How are you calling DataFrame.pivot and what datatype is your dates column?

Suppose I have a DataFrame that's similar to yours, the dates columns contains datetime objects:

In [52]: df
Out[52]: 
       data                dates loc
0  0.870900  2000-01-01 00:00:00   A
1  0.344999  2000-01-02 00:00:00   A
2  0.001729  2000-01-03 00:00:00   A
3  1.565684  2000-01-01 00:00:00   B
4 -0.851542  2000-01-02 00:00:00   B


In [53]: df.pivot('dates', 'loc', 'data')
Out[53]: 
loc                A         B
dates                         
2000-01-01  0.870900  1.565684
2000-01-02  0.344999 -0.851542
2000-01-03  0.001729       NaN
like image 69
Chang She Avatar answered Feb 25 '23 14:02

Chang She