Pandas. Selecting rows with missing values in multiple columns

Question

Suppose we have a dataframe with the columns 'Race', 'Age', 'Name'. I want to create two 2 DF's:
1) Without missing values in columns 'Race' and 'Age'
2) Only with missing values in columns 'Race' and 'Age'

I wrote the following code

first_df = df[df[columns].notnull()]
second_df= df[df[columns].isnull()]

However this code does not work. I solved this problem using this code

first_df= df[df['Race'].isnull() & df['Age'].isnull()]
second_df = df[df['Race'].isnull() & df['Age'].isnull()]

But what if there are 10 columns ? Is there a way to write this code without logical operators, using only columns list ?

jezrael · Accepted Answer

If select multiple columns get boolean DataFrame, then is necessary test if all columns are Trues by DataFrame.all or test if at least one True per rows by DataFrame.any:

first_df = df[df[columns].notnull().all(axis=1)]
second_df= df[df[columns].isnull().all(axis=1)]

You can also use ~ for invert mask:

mask = df[columns].notnull().all(axis=1)
first_df = df[mask]
second_df= df[~mask]

Amit Chauhan · Answer

Step 1 : Make a new dataframe having dropped the missing data (NaN, pd.NaT, None) you can filter out incomplete rows. DataFrame.dropna drops all rows containing at least one field with missing data

Assume new df as DF_updated and earlier as DF_Original

Step 2 : Now our solution DF will be difference between two DFs. It can be found by pd.concat([DF_Original,DF_updated]).drop_duplicates(keep=False)

Pandas. Selecting rows with missing values in multiple columns

Tags:

python

pandas

Rustem Sadykov

2 Answers

jezrael

Amit Chauhan

Recent Activity

Donate For Us

Pandas. Selecting rows with missing values in multiple columns

Tags:

python

pandas

Rustem Sadykov

2 Answers

jezrael

Amit Chauhan

Related questions

Recent Activity

Donate For Us