Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove rows in a dataframe with more than x number of Null values? [duplicate]

I am trying to remove the rows in the data frame with more than 7 null values. Please suggest something that is efficient to achieve this.

like image 894
tia Avatar asked Dec 04 '18 16:12

tia


2 Answers

If I understand correctly, you need to remove rows only if total nan's in a row is more than 7:

df = df[df.isnull().sum(axis=1) < 7]

This will keep only rows which have nan's less than 7 in the dataframe, and will remove all having nan's > 7.

like image 156
Mayank Porwal Avatar answered Sep 30 '22 19:09

Mayank Porwal


dropna has a thresh argument. Subtract your desired number from the number of columns.

thresh : int, optional Require that many non-NA values.

df.dropna(thresh=df.shape[1]-7, axis=0)

Sample Data:

print(df)
     0    1     2     3     4     5     6     7
0  NaN  NaN   NaN   NaN   NaN   NaN   NaN   NaN
1  NaN  NaN   NaN   NaN   NaN   NaN   NaN   5.0
2  6.0  7.0   8.0   9.0   NaN   NaN   NaN   NaN
3  NaN  NaN  11.0  12.0  13.0  14.0  15.0  16.0

df.dropna(thresh=df.shape[1]-7, axis=0)
     0    1     2     3     4     5     6     7
1  NaN  NaN   NaN   NaN   NaN   NaN   NaN   5.0
2  6.0  7.0   8.0   9.0   NaN   NaN   NaN   NaN
3  NaN  NaN  11.0  12.0  13.0  14.0  15.0  16.0
like image 27
ALollz Avatar answered Sep 30 '22 18:09

ALollz