Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a python dataframe with multiple rows into one row using python pandas?

Having the following dataframe,

df = pd.DataFrame({'device_id' : ['0','0','1','1','2','2'],
               'p_food'    : [0.2,0.1,0.3,0.5,0.1,0.7],
               'p_phone'   : [0.8,0.9,0.7,0.5,0.9,0.3]
              })
print(df)

output:

  device_id  p_food  p_phone
0         0     0.2      0.8
1         0     0.1      0.9
2         1     0.3      0.7
3         1     0.5      0.5
4         2     0.1      0.9
5         2     0.7      0.3

How to achieve this transformation?

df2 = pd.DataFrame({'device_id' : ['0','1','2'],
                   'p_food_1'    : [0.2,0.3,0.1],
                   'p_food_2'    : [0.1,0.5,0.7],
                   'p_phone_1'   : [0.8,0.7,0.9],                    
                   'p_phone_2'   : [0.9,0.5,0.3]
                  })
print(df2)

Output:

  device_id  p_food_1  p_food_2  p_phone_1  p_phone_2
0         0       0.2       0.1        0.8        0.9
1         1       0.3       0.5        0.7        0.5
2         2       0.1       0.7        0.9        0.3

I try to achieve it use groupby,apply,agg...
But I still can't achieve this transformation.

Update
My final Code:

df.drop_duplicates('device_id', keep='first').merge(df.drop_duplicates('device_id', keep='last'),on='device_id')

I appreciated su79eu7k's and A-Za-z's time and effort.
Words are not enough to express my gratitude.

like image 411
Dondon Jie Avatar asked May 02 '17 02:05

Dondon Jie


2 Answers

If you are still looking for an answer using groupby

df = df.groupby('device_id')['p_food', 'p_phone'].apply(lambda x: pd.DataFrame(x.values)).unstack().reset_index()
df.columns = df.columns.droplevel()
df.columns = ['device_id','p_food_1', 'p_food_2', 'p_phone_1','p_phone_2']

You get

    device_id   p_food_1    p_food_2    p_phone_1   p_phone_2
0   0           0.2         0.1         0.8         0.9
1   1           0.3         0.5         0.7         0.5
2   2           0.1         0.7         0.9         0.3
like image 58
Vaishali Avatar answered Oct 12 '22 16:10

Vaishali


df_m = df.drop_duplicates('device_id', keep='first')\
         .merge(df, on='device_id')\
         .drop_duplicates('device_id', keep='last')\
         [['device_id', 'p_food_x', 'p_food_y', 'p_phone_x', 'p_phone_y']]\
         .reset_index(drop=True)

print(df_m)

  device_id  p_food_x  p_food_y  p_phone_x  p_phone_y
0         0       0.2       0.1        0.8        0.9
1         1       0.3       0.5        0.7        0.5
2         2       0.1       0.7        0.9        0.3
like image 29
su79eu7k Avatar answered Oct 12 '22 14:10

su79eu7k