Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby give any non nan values

I'm trying to perform a groupby on a table where given this groupby index, all values are either correct or Nan. EG:

    id country    name
0    1  France    None
1    1  France  Pierre
2    2    None   Marge
3    1    None  Pierre
4    3     USA     Jim
5    3    None     Jim
6    2      UK    None
7    4   Spain  Alvaro
8    2    None   Marge
9    3    None     Jim
10   4   Spain    None
11   3    None     Jim

I just want to get the values for each of the 4 people, which should never clash, eg:

   country    name
id                
1   France  Pierre
2       UK   Marge
3      USA     Jim
4    Spain  Alvaro

I've tried:

groupby().first()
groupby.nth(0,dropna='any'/'all')

and even

groupby().apply(lambda x: x.loc[x.first_valid_index()])

All to no avail. What am I missing?

EDIT: to help you making the example dataframe for testing:

df = pd.DataFrame({'id':[1,1,2,1,3,3,2,4,2,3,4,3],'country':['France','France',None,None,'USA',None,'UK','Spain',None,None,'Spain',None],'name':[None,'Pierre','Marge','Pierre','Jim','Jim',None,'Alvaro','Marge','Jim',None,'Jim']})
like image 665
Jim Eisenberg Avatar asked Mar 21 '19 16:03

Jim Eisenberg


People also ask

How do I get NaN values in pandas Dataframe?

1 (1) Using Numpy You can easily create NaN values in Pandas DataFrame by using Numpy. ... 2 (2) Importing a file with blank values If you import a file using Pandas, and that file contains blank values, then you’ll get NaN values for those blank instances. ... 3 (3) Applying to_numeric

How to get the first non-NaN value in a group in pandas?

First, using pandas.groupby.first () You can see that we do not strictly get the first value rather we get the first non-Nan value in each group with the pandas.groupby.first () function. Now, let’s use the pandas.groupby.nth (0) function.

What is the use of groupby in pandas?

You can use Pandas groupby to group the underlying data on one or more columns and estimate useful statistics like count, mean , median, std , min , max etc. Sometimes knowing the first, last, or the nth value in the group might also be useful.

How do I get the nth row of a group in pandas?

The pandas.groupby.nth () function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth () function. For example, let’s again get the first “GRE Score” for each student but using the nth () function this time. We get the same result as above.


2 Answers

Pandas groupby.first returns first not-null value but does not support None, try

df.fillna(np.nan).groupby('id').first()

    country name
id      
1   France  Pierre
2   UK      Marge
3   USA     Jim
4   Spain   Alvaro
like image 72
Vaishali Avatar answered Oct 28 '22 13:10

Vaishali


Possible specifying to dropna when values are None

df.groupby('id').first(dropna=True)

   country    name
id                
1   France  Pierre
2       UK   Marge
3      USA     Jim
4    Spain  Alvaro
like image 38
ALollz Avatar answered Oct 28 '22 14:10

ALollz