Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select data when specific columns have null value in pandas

Tags:

python

pandas

I have a dataframe where there are 2 date fields I want to filter and see rows when any one of the date field is null.

ID          Date1       Date2
58844880    04/11/16    NaN
59745846    04/12/16    04/14/16
59743311    04/13/16    NaN
59745848    04/14/16    04/11/16
59598413    NaN         NaN
59745921    04/14/16    04/14/16
59561199    04/15/16    04/15/16
NaN         04/16/16    04/16/16
59561198    NaN         04/17/16

It should look like below

ID          Date1       Date2
58844880    04/11/16    NaN
59743311    04/13/16    NaN
59598413    NaN         NaN
59561198    NaN         04/17/16

Tried the code df = (df['Date1'].isnull() | df['Date1'].isnull())

like image 493
yasin mohammed Avatar asked Nov 03 '16 18:11

yasin mohammed


2 Answers

Use boolean indexing:

mask = df['Date1'].isnull() | df['Date2'].isnull()
print (df[mask])
           ID     Date1     Date2
0  58844880.0  04/11/16       NaN
2  59743311.0  04/13/16       NaN
4  59598413.0       NaN       NaN
8  59561198.0       NaN  04/17/16

Timings:

#[900000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [12]: %timeit (df[df['Date1'].isnull() | df['Date2'].isnull()])
10 loops, best of 3: 89.3 ms per loop

In [13]: %timeit (df[df.filter(like='Date').isnull().any(1)])
10 loops, best of 3: 146 ms per loop
like image 120
jezrael Avatar answered Oct 05 '22 03:10

jezrael


Quickly see if either column has any null values

df.isnull().any()

Count rows that have any null values

df.isnull().sum()

Get rows with null values

(1) Create truth table of null values (i.e. create dataframe with True/False in each column/cell, according to whether it has null value)

truth_table = df.isnull()

(2) Create truth table that shows conclusively which rows have any null values

conclusive_truth_table = truth_table.any(axis='columns')

(3) isolate/show rows that have any null values

df[conclusive_truth_table]

(1)-(3) put it all together

df[df.isnull().any(axis='columns')]

Alternatively

Isolate rows that have null values in any specified column

df.loc[:,['Date1','Date2']].isnull().any(axis='columns')

Isolate rows that have null values in BOTH specified columns

df[ df.loc[ :,['Date1','Date2'] ].isnull().sum(axis=1) == 2]
like image 32
markling Avatar answered Oct 05 '22 05:10

markling