I am trying to do a cross tab based on one column where a third column matches. Take the example data:
df = pd.DataFrame({'demographic' : ['A', 'B', 'B', 'A', 'C', 'C'],
'id_match' : ['101', '101', '201', '201', '26', '26'],
'time' : ['10', '10', '16', '16', '1', '1']})
where id_match matches i want to find the resulting sum of of the time for the cross tab of the demographic column. The output would look like this:
A B C
A 0 52 0
B 52 0 0
C 0 0 2
Hopefully that makes sense, comment if not. Thanks J
You can solve this using merge
and crosstab
:
u = df.reset_index()
v = u.merge(u, on='id_match').query('index_x != index_y')
r = pd.crosstab(v.demographic_x,
v.demographic_y,
v.time_x.astype(int) + v.time_y.astype(int),
aggfunc='sum')
print(r)
demographic_y A B C
demographic_x
A NaN 52.0 NaN
B 52.0 NaN NaN
C NaN NaN 4.0
If you need the NaNs filled in with zeros, you can use fillna
:
r.fillna(0, downcast='infer')
demographic_y A B C
demographic_x
A 0 52 0
B 52 0 0
C 0 0 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With