I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc. my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column. something like this isn't working: <pre class="prettyprint"><code>df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique() </code></pre> nor is <pre class="prettyprint"><code>df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique) </code></pre> this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset): <pre class="prettyprint"><code>df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum) </code></pre> in R this is easily done in data.table with <pre class="prettyprint"><code>df[, n_unique_id := uniqueN(id), by = c('track', 'type')] </code></pre> thanks!

<pre class="prettyprint"><code>df.groupby(['track', 'type'])['id'].transform(nunique) </code></pre> Implies that there is a name <code>nunique</code> in the name space that performs some function. <code>transform</code> will take a function or a string that it knows a function for. <code>nunique</code> is definitely one of those strings. As pointed out by @root, often the method that <code>pandas</code> will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is <code>True</code> even for passing <code>numpy</code> functions in some cases. For example <code>transform('sum')</code> should be preferred over <code>transform(sum)</code>. Try this instead <pre class="prettyprint"><code>df.groupby(['track', 'type'])['id'].transform('nunique') </code></pre> demo <pre class="prettyprint"><code>df = pd.DataFrame(dict( track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW'))) print(df) id track type 0 X 1 A 1 X 1 A 2 Y 1 A 3 Z 1 A 4 W 2 B 5 W 2 B 6 W 2 B 7 W 2 B df.groupby(['track', 'type'])['id'].transform('nunique') 0 3 1 3 2 3 3 3 4 1 5 1 6 1 7 1 Name: id, dtype: int64 </code></pre>

Adding a grouped, aggregate nunique column to pandas dataframe

Tags:

python

pandas

dataframe

aggregate

pandas-groupby

I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.

my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.

something like this isn't working:

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()

nor is

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)

this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)

in R this is easily done in data.table with

df[, n_unique_id := uniqueN(id), by = c('track', 'type')]

thanks!

767

asked May 01 '17 21:05

wbarts

1 Answers

df.groupby(['track', 'type'])['id'].transform(nunique)

Implies that there is a name nunique in the name space that performs some function. transform will take a function or a string that it knows a function for. nunique is definitely one of those strings.

As pointed out by @root, often the method that pandas will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True even for passing numpy functions in some cases.

For example transform('sum') should be preferred over transform(sum).

Try this instead

df.groupby(['track', 'type'])['id'].transform('nunique')

demo

df = pd.DataFrame(dict(
    track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)

  id track type
0  X     1    A
1  X     1    A
2  Y     1    A
3  Z     1    A
4  W     2    B
5  W     2    B
6  W     2    B
7  W     2    B

df.groupby(['track', 'type'])['id'].transform('nunique')

0    3
1    3
2    3
3    3
4    1
5    1
6    1
7    1
Name: id, dtype: int64

answered Nov 10 '22 10:11

piRSquared

Related questions
                            
                                Can xs:anyURI contain square brackets in XSD?
                            
                                python how to programmatically get line number of class definition
                            
                                TensorFlow - Text recognition in image [closed]
                            
                                Reading binary data in python
                            
                                pandas read_csv converters performance issue
                            
                                Flask not processing other HTTP requests after Chrome browser accesses the web-site
                            
                                Kivy: Label text does not update during for-loop
                            
                                How to convert from infix to postfix/prefix using AST python module?
                            
                                matplotlibrc rcParams modified for Jupyter inline plots
                            
                                TypeError at /app/profile/ , 'list' object is not callable handle_pageBegin args=()
                            
                                Extracting text from pdf using Python and Pypdf2
                            
                                Django: generate a CSV file and store it into FileField
                            
                                pymongo typeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument
                            
                                How can a test in Python unittest get access to the verbosity level?
                            
                                How to convert multipage PDF to list of image objects in Python?
                            
                                Python/Matplotlib: adding regression line to a plot given its intercept and slope
                            
                                Python. Parameters and returned values
                            
                                Reconstruction of tensor in sktensor/scikit-tensor after decomposition using HOSVD
                            
                                How to transpose rows to columns in Pandas?
                            
                                Sphinx: different relative paths to same figure possible?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With