Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count unique values using pandas groupby

I have data of the following form:

df = pd.DataFrame({     'group': [1, 1, 2, 3, 3, 3, 4],     'param': ['a', 'a', 'b', np.nan, 'a', 'a', np.nan] }) print(df)  #    group param # 0      1     a # 1      1     a # 2      2     b # 3      3   NaN # 4      3     a # 5      3     a # 6      4   NaN 

Non-null values within groups are always the same. I want to count the non-null value for each group (where it exists) once, and then find the total counts for each value.

I'm currently doing this in the following (clunky and inefficient) way:

param = [] for _, group in df[df.param.notnull()].groupby('group'):     param.append(group.param.unique()[0]) print(pd.DataFrame({'param': param}).param.value_counts())  # a    2 # b    1 

I'm sure there's a way to do this more cleanly and without using a loop, but I just can't seem to work it out. Any help would be much appreciated.

like image 818
user1684046 Avatar asked Jan 01 '17 11:01

user1684046


1 Answers

I think you can use SeriesGroupBy.nunique:

print (df.groupby('param')['group'].nunique()) param a    2 b    1 Name: group, dtype: int64 

Another solution with unique, then create new df by DataFrame.from_records, reshape to Series by stack and last value_counts:

a = df[df.param.notnull()].groupby('group')['param'].unique() print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts()) a    2 b    1 dtype: int64 
like image 139
jezrael Avatar answered Sep 28 '22 07:09

jezrael