Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas. Selecting rows with missing values in multiple columns

Tags:

python

pandas

Suppose we have a dataframe with the columns 'Race', 'Age', 'Name'. I want to create two 2 DF's:
1) Without missing values in columns 'Race' and 'Age'
2) Only with missing values in columns 'Race' and 'Age'

I wrote the following code

first_df = df[df[columns].notnull()]
second_df= df[df[columns].isnull()]

However this code does not work. I solved this problem using this code

first_df= df[df['Race'].isnull() & df['Age'].isnull()]
second_df = df[df['Race'].isnull() & df['Age'].isnull()]

But what if there are 10 columns ? Is there a way to write this code without logical operators, using only columns list ?

like image 433
Rustem Sadykov Avatar asked Apr 19 '20 07:04

Rustem Sadykov


2 Answers

If select multiple columns get boolean DataFrame, then is necessary test if all columns are Trues by DataFrame.all or test if at least one True per rows by DataFrame.any:

first_df = df[df[columns].notnull().all(axis=1)]
second_df= df[df[columns].isnull().all(axis=1)]

You can also use ~ for invert mask:

mask = df[columns].notnull().all(axis=1)
first_df = df[mask]
second_df= df[~mask]
like image 88
jezrael Avatar answered Nov 03 '22 03:11

jezrael


Step 1 : Make a new dataframe having dropped the missing data (NaN, pd.NaT, None) you can filter out incomplete rows. DataFrame.dropna drops all rows containing at least one field with missing data

Assume new df as DF_updated and earlier as DF_Original

Step 2 : Now our solution DF will be difference between two DFs. It can be found by pd.concat([DF_Original,DF_updated]).drop_duplicates(keep=False)

like image 30
Amit Chauhan Avatar answered Nov 03 '22 03:11

Amit Chauhan