Suppose we have a dataframe with the columns 'Race', 'Age', 'Name'
. I want to create two 2 DF's:
1) Without missing values in columns 'Race' and 'Age'
2) Only with missing values in columns 'Race' and 'Age'
I wrote the following code
first_df = df[df[columns].notnull()]
second_df= df[df[columns].isnull()]
However this code does not work. I solved this problem using this code
first_df= df[df['Race'].isnull() & df['Age'].isnull()]
second_df = df[df['Race'].isnull() & df['Age'].isnull()]
But what if there are 10 columns ? Is there a way to write this code without logical operators, using only columns
list ?
If select multiple columns get boolean DataFrame
, then is necessary test if all columns are Trues by DataFrame.all
or test if at least one True per rows by DataFrame.any
:
first_df = df[df[columns].notnull().all(axis=1)]
second_df= df[df[columns].isnull().all(axis=1)]
You can also use ~
for invert mask:
mask = df[columns].notnull().all(axis=1)
first_df = df[mask]
second_df= df[~mask]
Step 1 : Make a new dataframe having dropped the missing data (NaN, pd.NaT, None) you can filter out incomplete rows. DataFrame.dropna drops all rows containing at least one field with missing data
Assume new df as DF_updated and earlier as DF_Original
Step 2 : Now our solution DF will be difference between two DFs. It can be found by pd.concat([DF_Original,DF_updated]).drop_duplicates(keep=False)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With