I have data of the following form:
df = pd.DataFrame({ 'group': [1, 1, 2, 3, 3, 3, 4], 'param': ['a', 'a', 'b', np.nan, 'a', 'a', np.nan] }) print(df) # group param # 0 1 a # 1 1 a # 2 2 b # 3 3 NaN # 4 3 a # 5 3 a # 6 4 NaN
Non-null values within groups are always the same. I want to count the non-null value for each group (where it exists) once, and then find the total counts for each value.
I'm currently doing this in the following (clunky and inefficient) way:
param = [] for _, group in df[df.param.notnull()].groupby('group'): param.append(group.param.unique()[0]) print(pd.DataFrame({'param': param}).param.value_counts()) # a 2 # b 1
I'm sure there's a way to do this more cleanly and without using a loop, but I just can't seem to work it out. Any help would be much appreciated.
I think you can use SeriesGroupBy.nunique
:
print (df.groupby('param')['group'].nunique()) param a 2 b 1 Name: group, dtype: int64
Another solution with unique
, then create new df
by DataFrame.from_records
, reshape to Series
by stack
and last value_counts
:
a = df[df.param.notnull()].groupby('group')['param'].unique() print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts()) a 2 b 1 dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With