Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas row values to column headers

I have a daraframe like this

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],'id2':[1,1,1,1,2,2,2],'value':['a','b','c','d','a','b','c']})

   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     c
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

I need to transform into this form

   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

There can be any number of levels in the value variables for each id ranging from 1 to 10. if the level is not present for that id it should be 0 else 1.

I am using anaconda python 3.5, windows 10

like image 836
Gowtham M Avatar asked Jun 24 '17 05:06

Gowtham M


1 Answers

If need output 1 and 0 only for presence of value:

You can use get_dummies with Series created by set_index, but then is necessary groupby + GroupBy.max:

df = pd.get_dummies(df.set_index(['id1','id2'])['value'])
       .groupby(level=[0,1])
       .max()
       .reset_index()
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

Another solution with groupby, size and unstack, but then is necesary compare with gt and convert to int by astype. Last reset_index and rename_axis:

df = df.groupby(['id1','id2', 'value'])
      .size()
      .unstack(fill_value=0)
      .gt(0)
      .astype(int)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  1  1  1  1
1    2    2  1  1  1  0

If need count values:

df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],
                   'id2':[1,1,1,1,2,2,2],
                   'value':['a','b','a','d','a','b','c']})

print (df)
   id1  id2 value
0    1    1     a
1    1    1     b
2    1    1     a
3    1    1     d
4    2    2     a
5    2    2     b
6    2    2     c

df = df.groupby(['id1','id2', 'value'])
       .size()
       .unstack(fill_value=0)
       .reset_index()
       .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0

Or:

df = df.pivot_table(index=['id1','id2'], columns='value', aggfunc='size', fill_value=0)
      .reset_index()
      .rename_axis(None, axis=1)
print (df)
   id1  id2  a  b  c  d
0    1    1  2  1  0  1
1    2    2  1  1  1  0
like image 72
jezrael Avatar answered Sep 21 '22 16:09

jezrael