I have a DF with 200 columns. Most of them are with NaN's. I would like to select all columns with no NaN's or at least with the minimum NaN's. I've tried to drop all with a threshold or with notnull() but without success. Any ideas.
df.dropna(thresh=2, inplace=True)
df_notnull = df[df.notnull()]
DF for example:
col1 col2 col3
23 45 NaN
54 39 NaN
NaN 45 76
87 32 NaN
The output should look like:
df.dropna(axis=1, thresh=2)
col1 col2
23 45
54 39
NaN 45
87 32
Extract rows/columns with missing values in specific columns/rows. You can use the isnull() or isna() method of pandas. DataFrame and Series to check if each element is a missing value or not. isnull() is an alias for isna() , whose usage is the same.
You can use df. isnull(). sum() . It shows all columns and the total NaNs of each feature.
You can create with non-NaN columns using
df = df[df.columns[~df.isnull().all()]]
Or
null_cols = df.columns[df.isnull().all()]
df.drop(null_cols, axis = 1, inplace = True)
If you wish to remove columns based on a certain percentage of NaNs, say columns with more than 90% data as null
cols_to_delete = df.columns[df.isnull().sum()/len(df) > .90]
df.drop(cols_to_delete, axis = 1, inplace = True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With