I am trying to remove the rows in the data frame with more than 7 null values. Please suggest something that is efficient to achieve this.
If I understand correctly, you need to remove rows only if total nan's in a row is more than 7
:
df = df[df.isnull().sum(axis=1) < 7]
This will keep only rows which have nan
's less than 7 in the dataframe, and will remove all having nan's > 7.
dropna
has a thresh
argument. Subtract your desired number from the number of columns.
thresh : int, optional Require that many non-NA values.
df.dropna(thresh=df.shape[1]-7, axis=0)
print(df)
0 1 2 3 4 5 6 7
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN 5.0
2 6.0 7.0 8.0 9.0 NaN NaN NaN NaN
3 NaN NaN 11.0 12.0 13.0 14.0 15.0 16.0
df.dropna(thresh=df.shape[1]-7, axis=0)
0 1 2 3 4 5 6 7
1 NaN NaN NaN NaN NaN NaN NaN 5.0
2 6.0 7.0 8.0 9.0 NaN NaN NaN NaN
3 NaN NaN 11.0 12.0 13.0 14.0 15.0 16.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With