Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting Pandas rows where more than 1 column is not NaN [duplicate]

Tags:

python

pandas

I have a dataframe set up in the following way:

header_1 | header_2 | header_3 | header_4

a            b         NaN        NaN
b            c          9          10
x            y         NaN         8

How can I select using column indexes (the name of the columns change) the rows where header_3 and header_4 are BOTH not NaN? header_3 and header_4 are integers

Thank you

like image 420
Macterror Avatar asked Dec 11 '22 04:12

Macterror


1 Answers

If possible multiple columns defined in list check not missing values of filtered columns with DataFrame.all for check all Trues per rows:

cols = ['header_3','header_4']

df = df[df[cols].notnull().all(axis=1)]
print (df)
  header_1 header_2  header_3  header_4
1        b        c       9.0      10.0
# df[df[['header_3', 'header_4']].notnull().all(axis=1)]  # Just to avoid creating a list of cols and calling that.

For select by last 2 columns use iloc for select by positions:

df = df[df.iloc[:, -2:].notnull().all(axis=1)]

Also is possible specify columns by indexers:

#python count from 0
df = df[df.iloc[:, [2,3]].notnull().all(axis=1)]
# df[df.loc[:, ['header_3', 'header_4']].notnull().all(axis=1)]  # or can use loc with direct columns name

Or if only 2 columns chain conditions with & for bitwise AND:

df = df[df['header_3'].notnull() & df['header_4'].notnull()]
like image 126
jezrael Avatar answered Dec 21 '22 22:12

jezrael