Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a grouped, aggregate nunique column to pandas dataframe

I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.

my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.

something like this isn't working:

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()

nor is

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)

this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)

in R this is easily done in data.table with

df[, n_unique_id := uniqueN(id), by = c('track', 'type')]

thanks!

like image 767
wbarts Avatar asked May 01 '17 21:05

wbarts


People also ask

How do you make a new column in pandas that is an aggregation of other elements from other columns?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.

How do you aggregate multiple columns in Python?

To apply aggregations to multiple columns, just add additional key:value pairs to the dictionary. Applying multiple aggregation functions to a single column will result in a multiindex. Working with multi-indexed columns is a pain and I'd recommend flattening this after aggregating by renaming the new columns.

Can you use Groupby with multiple columns in pandas?

groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

What does Groupby AGG do in pandas?

What is the GroupBy function? Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.


1 Answers

df.groupby(['track', 'type'])['id'].transform(nunique)

Implies that there is a name nunique in the name space that performs some function. transform will take a function or a string that it knows a function for. nunique is definitely one of those strings.

As pointed out by @root, often the method that pandas will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True even for passing numpy functions in some cases.

For example transform('sum') should be preferred over transform(sum).

Try this instead

df.groupby(['track', 'type'])['id'].transform('nunique')

demo

df = pd.DataFrame(dict(
    track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)

  id track type
0  X     1    A
1  X     1    A
2  Y     1    A
3  Z     1    A
4  W     2    B
5  W     2    B
6  W     2    B
7  W     2    B

df.groupby(['track', 'type'])['id'].transform('nunique')

0    3
1    3
2    3
3    3
4    1
5    1
6    1
7    1
Name: id, dtype: int64
like image 61
piRSquared Avatar answered Nov 10 '22 10:11

piRSquared