I have a daraframe like this
df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],'id2':[1,1,1,1,2,2,2],'value':['a','b','c','d','a','b','c']})
id1 id2 value
0 1 1 a
1 1 1 b
2 1 1 c
3 1 1 d
4 2 2 a
5 2 2 b
6 2 2 c
I need to transform into this form
id1 id2 a b c d
0 1 1 1 1 1 1
1 2 2 1 1 1 0
There can be any number of levels in the value variables for each id ranging from 1 to 10. if the level is not present for that id it should be 0 else 1.
I am using anaconda python 3.5, windows 10
If need output 1
and 0
only for presence of value
:
You can use get_dummies
with Series
created by set_index
, but then is necessary groupby
+ GroupBy.max
:
df = pd.get_dummies(df.set_index(['id1','id2'])['value'])
.groupby(level=[0,1])
.max()
.reset_index()
print (df)
id1 id2 a b c d
0 1 1 1 1 1 1
1 2 2 1 1 1 0
Another solution with groupby
, size
and unstack
, but then is necesary compare with gt
and convert to int
by astype
. Last reset_index
and rename_axis
:
df = df.groupby(['id1','id2', 'value'])
.size()
.unstack(fill_value=0)
.gt(0)
.astype(int)
.reset_index()
.rename_axis(None, axis=1)
print (df)
id1 id2 a b c d
0 1 1 1 1 1 1
1 2 2 1 1 1 0
If need count value
s:
df = pd.DataFrame({'id1':[1,1,1,1,2,2,2],
'id2':[1,1,1,1,2,2,2],
'value':['a','b','a','d','a','b','c']})
print (df)
id1 id2 value
0 1 1 a
1 1 1 b
2 1 1 a
3 1 1 d
4 2 2 a
5 2 2 b
6 2 2 c
df = df.groupby(['id1','id2', 'value'])
.size()
.unstack(fill_value=0)
.reset_index()
.rename_axis(None, axis=1)
print (df)
id1 id2 a b c d
0 1 1 2 1 0 1
1 2 2 1 1 1 0
Or:
df = df.pivot_table(index=['id1','id2'], columns='value', aggfunc='size', fill_value=0)
.reset_index()
.rename_axis(None, axis=1)
print (df)
id1 id2 a b c d
0 1 1 2 1 0 1
1 2 2 1 1 1 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With