Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly?

I have a dataframe with ~300K rows and ~40 columns. I want to find out if any rows contain null values - and put these 'null'-rows into a separate dataframe so that I could explore them easily.

I can create a mask explicitly:

mask = False for col in df.columns:      mask = mask | df[col].isnull() dfnulls = df[mask] 

Or I can do something like:

df.ix[df.index[(df.T == np.nan).sum() > 1]] 

Is there a more elegant way of doing it (locating rows with nulls in them)?

like image 674
Lev Selector Avatar asked Jan 09 '13 22:01

Lev Selector


People also ask

How do you get rows which has null values in pandas?

In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.

How do I keep only null values in pandas?

To filter down just to the rows and columns that contain a NaN value, use loc and specify difference axis values using any() (the default value is 0 for columns). It returns a dataframe where there is a NaN in either the row or the column.


1 Answers

[Updated to adapt to modern pandas, which has isnull as a method of DataFrames..]

You can use isnull and any to build a boolean Series and use that to index into your frame:

>>> df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)]) >>> df.isnull()        0      1      2 0  False  False  False 1  False   True  False 2  False  False   True 3  False  False  False 4  False  False  False >>> df.isnull().any(axis=1) 0    False 1     True 2     True 3    False 4    False dtype: bool >>> df[df.isnull().any(axis=1)]    0   1   2 1  0 NaN   0 2  0   0 NaN 

[For older pandas:]

You could use the function isnull instead of the method:

In [56]: df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])  In [57]: df Out[57]:     0   1   2 0  0   1   2 1  0 NaN   0 2  0   0 NaN 3  0   1   2 4  0   1   2  In [58]: pd.isnull(df) Out[58]:         0      1      2 0  False  False  False 1  False   True  False 2  False  False   True 3  False  False  False 4  False  False  False  In [59]: pd.isnull(df).any(axis=1) Out[59]:  0    False 1     True 2     True 3    False 4    False 

leading to the rather compact:

In [60]: df[pd.isnull(df).any(axis=1)] Out[60]:     0   1   2 1  0 NaN   0 2  0   0 NaN 
like image 151
DSM Avatar answered Oct 21 '22 07:10

DSM