Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Pivoting with multi-index data

I have two dataframes which looks like this:

rating
   BMW  Fiat  Toyota
0    7     2       3
1    8     1       8
2    9    10       7
3    8     3       9

own
   BMW  Fiat  Toyota
0    1     1       0
1    0     1       1
2    0     0       1
3    0     1       1

I'm ultimately trying to get a pivot table of mean rating for usage by brand. Or something like this:

            BMW  Fiat  Toyota
Usage                        
0      8.333333    10       3
1      7.000000     2       8

My approach was to merge the datasets like this:

Measure  Rating                Own              
Brand       BMW  Fiat  Toyota  BMW  Fiat  Toyota
0             7     2       3    1     1       0
1             8     1       8    0     1       1
2             9    10       7    0     0       1
3             8     3       9    0     1       1

And then attempt to create a pivot table using rating as the value, own as the rows and brand as the columns. But I kept running to key issues. I have also attempted unstacking either the measure or brand levels, but I can't seem to use row index names as pivot keys.

What am I doing wrong? Is there a better approach to this?

like image 581
Brendon McLean Avatar asked Oct 17 '13 08:10

Brendon McLean


Video Answer


1 Answers

I'm not an expert in Pandas, so the solution may be more clumsy than you want, but:

rating = pd.DataFrame({"BMW":[7, 8, 9, 8], "Fiat":[2, 1, 10, 3], "Toyota":[3, 8, 7,9]})
own = pd.DataFrame({"BMW":[1, 0, 0, 0], "Fiat":[1, 1, 0, 1], "Toyota":[0, 1, 1, 1]})

r = rating.unstack().reset_index(name='value')
o = own.unstack().reset_index(name='value')
res = DataFrame({"Brand":r["level_0"], "Rating": r["value"], "Own": o["value"]})
res = res.groupby(["Own", "Brand"]).mean().reset_index()
res.pivot(index="Own", columns="Brand", values="Rating")

# result
# Brand       BMW  Fiat  Toyota
# Own                          
# 0      8.333333    10       3
# 1      7.000000     2       8

another solution, although not very much generalizable (you can use for loop, but you have to know which values do you have in own dataframe):

d = []
for o in (0, 1):
    t = rating[own == o]
    t["own"] = o
    d.append(t)

res = pd.concat(d).groupby("own").mean()
like image 111
Roman Pekar Avatar answered Sep 23 '22 07:09

Roman Pekar