Hi I am struggling with the following problem:
given a dataframe with columns name and variable I would like to create 2 lists:
below an example
import pandas
import numpy
df = pandas.DataFrame(data=[['x',1],['y',2],['x',4],['z',numpy.nan],
['x',numpy.nan],['y',3],['x',numpy.nan],['z',numpy.nan],],
columns=['name','variable'])
df:
name variable
0 x 1.0
1 y 2.0
2 x 4.0
3 z NaN
4 x NaN
5 y 3.0
6 x NaN
7 z NaN
the desired output should be
list_names_nan = [z]
list_names_not_nan = [x,y]
Use Series.isna to create a boolean mask then use Series.groupby on this mask and aggregate using all finally use this mask m to filter the nan and not_nan values:
m = df['variable'].isna().groupby(df['name']).all()
nan, not_nan = m[m].index.tolist(), m[~m].index.tolist()
Result:
['z'] # nan
['x', 'y'] # not_nan
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With