I'm trying to perform a groupby on a table where given this groupby index, all values are either correct or Nan. EG: <pre class="prettyprint"><code> id country name 0 1 France None 1 1 France Pierre 2 2 None Marge 3 1 None Pierre 4 3 USA Jim 5 3 None Jim 6 2 UK None 7 4 Spain Alvaro 8 2 None Marge 9 3 None Jim 10 4 Spain None 11 3 None Jim </code></pre> I just want to get the values for each of the 4 people, which should never clash, eg: <pre class="prettyprint"><code> country name id 1 France Pierre 2 UK Marge 3 USA Jim 4 Spain Alvaro </code></pre> I've tried: <pre class="prettyprint"><code>groupby().first() groupby.nth(0,dropna='any'/'all') </code></pre> and even <pre class="prettyprint"><code>groupby().apply(lambda x: x.loc[x.first_valid_index()]) </code></pre> All to no avail. What am I missing? EDIT: to help you making the example dataframe for testing: <pre class="prettyprint"><code>df = pd.DataFrame({'id':[1,1,2,1,3,3,2,4,2,3,4,3],'country':['France','France',None,None,'USA',None,'UK','Spain',None,None,'Spain',None],'name':[None,'Pierre','Marge','Pierre','Jim','Jim',None,'Alvaro','Marge','Jim',None,'Jim']}) </code></pre>

Pandas groupby.first returns first not-null value but does not support None, try <pre class="prettyprint"><code>df.fillna(np.nan).groupby('id').first() country name id 1 France Pierre 2 UK Marge 3 USA Jim 4 Spain Alvaro </code></pre>

Possible specifying to <code>dropna</code> when values are <code>None</code> <pre class="prettyprint"><code>df.groupby('id').first(dropna=True) country name id 1 France Pierre 2 UK Marge 3 USA Jim 4 Spain Alvaro </code></pre>

Pandas groupby give any non nan values

Tags:

python

python-3.x

pandas

I'm trying to perform a groupby on a table where given this groupby index, all values are either correct or Nan. EG:

    id country    name
0    1  France    None
1    1  France  Pierre
2    2    None   Marge
3    1    None  Pierre
4    3     USA     Jim
5    3    None     Jim
6    2      UK    None
7    4   Spain  Alvaro
8    2    None   Marge
9    3    None     Jim
10   4   Spain    None
11   3    None     Jim

I just want to get the values for each of the 4 people, which should never clash, eg:

   country    name
id                
1   France  Pierre
2       UK   Marge
3      USA     Jim
4    Spain  Alvaro

I've tried:

groupby().first()
groupby.nth(0,dropna='any'/'all')

and even

groupby().apply(lambda x: x.loc[x.first_valid_index()])

All to no avail. What am I missing?

EDIT: to help you making the example dataframe for testing:

df = pd.DataFrame({'id':[1,1,2,1,3,3,2,4,2,3,4,3],'country':['France','France',None,None,'USA',None,'UK','Spain',None,None,'Spain',None],'name':[None,'Pierre','Marge','Pierre','Jim','Jim',None,'Alvaro','Marge','Jim',None,'Jim']})

665

asked Mar 21 '19 16:03

Jim Eisenberg

2 Answers

Pandas groupby.first returns first not-null value but does not support None, try

df.fillna(np.nan).groupby('id').first()

    country name
id      
1   France  Pierre
2   UK      Marge
3   USA     Jim
4   Spain   Alvaro

answered Oct 28 '22 13:10

Vaishali

Possible specifying to dropna when values are None

df.groupby('id').first(dropna=True)

   country    name
id                
1   France  Pierre
2       UK   Marge
3      USA     Jim
4    Spain  Alvaro

answered Oct 28 '22 14:10

ALollz

Related questions
                            
                                Calculate two maximums at the same time?
                            
                                Empty string with Tesseract
                            
                                `ImportError: No module named AppKit` after installing AppKit and PyObjC
                            
                                How can I implement OpenCV's perspectiveTransform in Python
                            
                                Append list to pandas DataFrame as new row with index
                            
                                How to convert a python script in a local conda env into systemd service in Linux?
                            
                                Why am I receive AlreadyExistsError?
                            
                                LabelEncoder that keeps missing values as 'NaN'
                            
                                How to generate both server and client certificates under root CA
                            
                                Where can I find numpy.where() source code? [duplicate]
                            
                                Python type-hint friendly type that constrains possible values
                            
                                Why is `json.dump()` not ending the line with `\n`?
                            
                                Python: logging comments printed to console before other outputs
                            
                                Wrong current working directory when running python code and jupyter extension in vscode
                            
                                Find elements in a list of which all elements in another list are factors, using a list comprehension
                            
                                Homebrew pyenv install error dyld: Library not loaded: /usr/local/opt/readline/lib/libreadline.7.dylib
                            
                                Python pytest does not show assertion differences
                            
                                /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.21' not found required by TensorFlow
                            
                                How to run flask_migrate in Docker
                            
                                Pytest - testing parser Error : Unrecognised arguments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With