finding unique rows of a pandas Dataframe column for which all the values of a second column are NaN

Question

Hi I am struggling with the following problem:

given a dataframe with columns name and variable I would like to create 2 lists:

list_names_nan containing the names for which all the values in the variable column are nan
list_names_not_nan containing the names for which at list 1 value in the variable column is not nan

below an example

import pandas
import numpy

df = pandas.DataFrame(data=[['x',1],['y',2],['x',4],['z',numpy.nan],
                            ['x',numpy.nan],['y',3],['x',numpy.nan],['z',numpy.nan],],
                            columns=['name','variable'])
df:
  name  variable
0    x       1.0
1    y       2.0
2    x       4.0
3    z       NaN
4    x       NaN
5    y       3.0
6    x       NaN
7    z       NaN

the desired output should be

list_names_nan = [z]
list_names_not_nan = [x,y]

Shubham Sharma · Accepted Answer

Use Series.isna to create a boolean mask then use Series.groupby on this mask and aggregate using all finally use this mask m to filter the nan and not_nan values:

m = df['variable'].isna().groupby(df['name']).all()
nan, not_nan = m[m].index.tolist(),  m[~m].index.tolist()

Result:

['z']  # nan
['x', 'y'] # not_nan

finding unique rows of a pandas Dataframe column for which all the values of a second column are NaN

Tags:

python

pandas

dataframe

pandas-groupby

gabboshow

1 Answers

Shubham Sharma

Recent Activity

Donate For Us

finding unique rows of a pandas Dataframe column for which all the values of a second column are NaN

Tags:

python

pandas

dataframe

pandas-groupby

gabboshow

1 Answers

Shubham Sharma

Related questions

Recent Activity

Donate For Us