I want to use <code>unique</code> in <code>groupby</code> aggregation, but I don't want <code>nan</code> in the <code>unique</code> result. An example dataframe: <pre class="prettyprint"><code>df = pd.DataFrame({'a': [1, 2, 1, 1, np.nan, 3, 3], 'b': [0,0,1,1,1,1,1], 'c': ['foo', np.nan, 'bar', 'foo', 'baz', 'foo', 'bar']}) a b c 0 1.0000 0 foo 1 2.0000 0 NaN 2 1.0000 1 bar 3 1.0000 1 foo 4 nan 1 baz 5 3.0000 1 foo 6 3.0000 1 bar </code></pre> And the <code>groupby</code>: <pre class="prettyprint"><code>df.groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']}) </code></pre> Its result is: <pre class="prettyprint"><code> a c min max unique first last unique b 0 1.0000 2.0000 [1.0, 2.0] foo foo [foo, nan] 1 1.0000 3.0000 [1.0, nan, 3.0] bar bar [bar, foo, baz] </code></pre> But I want it without <code>nan</code>: <pre class="prettyprint"><code> a c min max unique first last unique b 0 1.0000 2.0000 [1.0, 2.0] foo foo [foo] 1 1.0000 3.0000 [1.0, 3.0] bar bar [bar, foo, baz] </code></pre> How can I do that? Of course I have several columns to aggregate and every column needs different aggregation functions, so I don't want to do the <code>unique</code> aggregations one-by-one and separately from other aggregations.

Define a function: <pre class="prettyprint"><code>def unique_non_null(s): return s.dropna().unique() </code></pre> Then use it in the aggregation: <pre class="prettyprint"><code>df.groupby('b').agg({ 'a': ['min', 'max', unique_non_null], 'c': ['first', 'last', unique_non_null] }) </code></pre>

This will work for what you need: <pre class="prettyprint"><code>df.fillna(method='ffill').groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']}) </code></pre> Because you use <code>min</code>, <code>max</code> and <code>unique</code> repeated values do not concern you.

<h3>Update 23 November 2020</h3> This answer is terrible, don't use this. Please refer @IanS's answer. <h3>Earlier</h3> Try <code>ffill</code> <pre class="prettyprint"><code>df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']}) </code></pre> <pre class="prettyprint"> c a first last unique min max unique b 0 foo foo [foo] 1.0 2.0 [1.0, 2.0] 1 bar bar [bar, foo, baz] 1.0 3.0 [1.0, 3.0] </pre> If Nan is the first element of the group then the above solution breaks.

Python pandas unique value ignoring NaN

Tags:

python

null

pandas

unique

group-by

I want to use unique in groupby aggregation, but I don't want nan in the unique result.

An example dataframe:

Click to copy

df = pd.DataFrame({'a': [1, 2, 1, 1, np.nan, 3, 3], 'b': [0,0,1,1,1,1,1],
    'c': ['foo', np.nan, 'bar', 'foo', 'baz', 'foo', 'bar']})

       a  b    c
0 1.0000  0  foo
1 2.0000  0  NaN
2 1.0000  1  bar
3 1.0000  1  foo
4    nan  1  baz
5 3.0000  1  foo
6 3.0000  1  bar

And the groupby:

Click to copy

df.groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

Its result is:

Click to copy

       a                             c                      
     min    max           unique first last           unique
b                                                           
0 1.0000 2.0000       [1.0, 2.0]   foo  foo       [foo, nan]
1 1.0000 3.0000  [1.0, nan, 3.0]   bar  bar  [bar, foo, baz]

But I want it without nan:

Click to copy

       a                        c                      
     min    max      unique first last           unique
b                                                           
0 1.0000 2.0000  [1.0, 2.0]   foo  foo            [foo]
1 1.0000 3.0000  [1.0, 3.0]   bar  bar  [bar, foo, baz]

How can I do that? Of course I have several columns to aggregate and every column needs different aggregation functions, so I don't want to do the unique aggregations one-by-one and separately from other aggregations.

723

asked Sep 14 '17 12:09

ragesz

3 Answers

Define a function:

Click to copy

def unique_non_null(s):
    return s.dropna().unique()

Then use it in the aggregation:

Click to copy

df.groupby('b').agg({
    'a': ['min', 'max', unique_non_null], 
    'c': ['first', 'last', unique_non_null]
})

140

answered Oct 14 '22 02:10

IanS

This will work for what you need:

Click to copy

df.fillna(method='ffill').groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

Because you use min, max and unique repeated values do not concern you.

answered Oct 14 '22 01:10

zipa

Update 23 November 2020

This answer is terrible, don't use this. Please refer @IanS's answer.

Earlier

Try ffill

Click to copy

df.ffill().groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']})

Click to copy

      c                          a                 
  first last           unique  min  max      unique
b                                                  
0   foo  foo            [foo]  1.0  2.0  [1.0, 2.0]
1   bar  bar  [bar, foo, baz]  1.0  3.0  [1.0, 3.0]

If Nan is the first element of the group then the above solution breaks.

answered Oct 14 '22 02:10

Bharath

Related questions
                            
                                python matrix transpose and zip
                            
                                hex string to character in python
                            
                                Decoder JPEG not available error when following Django photo app tutorial
                            
                                Selenium webdriver using switch_to_windows() and printing the title doesn't print the title.
                            
                                Can SQLAlchemy events be used to update a denormalized data cache?
                            
                                web.py - specify address and port
                            
                                How to only read lines in a text file after a certain string?
                            
                                Generate word cloud from single-column Pandas dataframe
                            
                                Python Exceptions: EAFP and What is Really Exceptional?
                            
                                Sort list by given order of indices
                            
                                Python "if X == Y and Z" syntax
                            
                                Django counter in loop to index list
                            
                                Weirdness calling str() to convert integer to string in Python 3?
                            
                                Applying a coloured overlay to an image in either PIL or Imagemagik
                            
                                Convert sql result to list python
                            
                                PYODBC to Pandas - DataFrame not working - Shape of passed values is (x,y), indices imply (w,z)
                            
                                Difference in output between numpy linspace and numpy logspace
                            
                                how to switch columns rows in a pandas dataframe
                            
                                SQLAlchemy - "Dynamic Filter"
                            
                                flask - blueprint - sqlalchemy - cannot import name 'db' into moles file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python pandas unique value ignoring NaN

Tags:

python

null

pandas

unique

group-by

ragesz

People also ask

3 Answers

IanS

zipa

Update 23 November 2020

Earlier

Bharath

Recent Activity

Donate For Us