I'm trying to perform a groupby on a table where given this groupby index, all values are either correct or Nan. EG:
id country name
0 1 France None
1 1 France Pierre
2 2 None Marge
3 1 None Pierre
4 3 USA Jim
5 3 None Jim
6 2 UK None
7 4 Spain Alvaro
8 2 None Marge
9 3 None Jim
10 4 Spain None
11 3 None Jim
I just want to get the values for each of the 4 people, which should never clash, eg:
country name
id
1 France Pierre
2 UK Marge
3 USA Jim
4 Spain Alvaro
I've tried:
groupby().first()
groupby.nth(0,dropna='any'/'all')
and even
groupby().apply(lambda x: x.loc[x.first_valid_index()])
All to no avail. What am I missing?
EDIT: to help you making the example dataframe for testing:
df = pd.DataFrame({'id':[1,1,2,1,3,3,2,4,2,3,4,3],'country':['France','France',None,None,'USA',None,'UK','Spain',None,None,'Spain',None],'name':[None,'Pierre','Marge','Pierre','Jim','Jim',None,'Alvaro','Marge','Jim',None,'Jim']})
1 (1) Using Numpy You can easily create NaN values in Pandas DataFrame by using Numpy. ... 2 (2) Importing a file with blank values If you import a file using Pandas, and that file contains blank values, then you’ll get NaN values for those blank instances. ... 3 (3) Applying to_numeric
First, using pandas.groupby.first () You can see that we do not strictly get the first value rather we get the first non-Nan value in each group with the pandas.groupby.first () function. Now, let’s use the pandas.groupby.nth (0) function.
You can use Pandas groupby to group the underlying data on one or more columns and estimate useful statistics like count, mean , median, std , min , max etc. Sometimes knowing the first, last, or the nth value in the group might also be useful.
The pandas.groupby.nth () function is used to get the value corresponding the nth row for each group. To get the first value in a group, pass 0 as an argument to the nth () function. For example, let’s again get the first “GRE Score” for each student but using the nth () function this time. We get the same result as above.
Pandas groupby.first returns first not-null value but does not support None, try
df.fillna(np.nan).groupby('id').first()
country name
id
1 France Pierre
2 UK Marge
3 USA Jim
4 Spain Alvaro
Possible specifying to dropna
when values are None
df.groupby('id').first(dropna=True)
country name
id
1 France Pierre
2 UK Marge
3 USA Jim
4 Spain Alvaro
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With