Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivoting pandas dataframe by rank on id

I'm currently trying to pivot my pandas DataFrame by 'id' on 'rank'

print(df)

     id  rank  year  
0   key0  1    2011  
1   key0  2    2012  
2   key0  3    2013  
3   key1  1    2014  
4   key1  2    2015  
5   key1  3    2016  
6   key2  1    2017 
7   key2  2    2018 
8   key2  3    2019 

Depending on the max('rank'), I want to create as many 'years' columns and give them values according to the ascending rank

print(df)

     id  rank1  year1  rank2  year2  rank3   year3  
0   key0   1     2011    2     2012    3      2013
1   key1   1     2014    2     2015    3      2016  
2   key2   1     2017    2     2018    3      2019

I tried my own solution (currently working, but I have ~2M rows and is not very effective)

df2= df.melt(id_vars=["id", "rank"], value_vars=[elem for elem in df.columns if elem not ['id','rank']])
df2['col_name'] =df2['variable']+ (df2['rang']-1).astype('str')
df2.value.fillna(0, inplace = True)
df2= pd.pivot_table(df2, index=["id"], columns=["col_name"], values="value", aggfunc=max)

I know that it is not the optimal solution and is memory consuming, here is why I'm asking for a better solution

Thanks in advance

like image 950
JBSH Avatar asked Sep 15 '20 08:09

JBSH


2 Answers

Use DataFrame.sort_values with DataFrame.pivot, sorting MultiIndex by DataFrame.sort_index and then flatten it by f-strings:

df1 = (df.sort_values(['id','rank'])
         .pivot(index="id",columns="rank", values=["year","rank"])
         .sort_index(axis=1, level=1))
df1.columns = [f'{a}{b}' for a, b in df1.columns]
df1 = df1.reset_index()
print (df1)
     id  rank1  year1  rank2  year2  rank3  year3
0  key0      1   2011      2   2012      3   2013
1  key1      1   2014      2   2015      3   2016
2  key2      1   2017      2   2018      3   2019
like image 95
jezrael Avatar answered Oct 14 '22 14:10

jezrael


While this doesn't mimic the exact output, a simpler approach involves performing a pivot right off the bat.

df.pivot(index="id", columns="rank", values="year")

rank     1     2     3
id                    
key0  2011  2012  2013
key1  2014  2015  2016
key2  2017  2018  2019

I personally don't like having numbers as my column headers, so I would:

df.pivot(index="id", columns="rank", values="year").rename(columns="rank_{}".format)

rank  rank_1  rank_2  rank_3
id                          
key0  2011    2012    2013  
key1  2014    2015    2016  
key2  2017    2018    2019 
like image 42
Cameron Riddell Avatar answered Oct 14 '22 13:10

Cameron Riddell