Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify leading and trailing NAs in pandas DataFrame

Tags:

python

pandas

Is there a way to identify leading and trailing NAs in a pandas.DataFrame

Currently I do the following but it seems not straightforward:

import pandas as pd
df = pd.DataFrame(dict(a=[0.1, 0.2, 0.2],
                       b=[None, 0.1, None],
                       c=[0.1, None, 0.1]) 
lead_na = (df.isnull() == False).cumsum() == 0
trail_na = (df.iloc[::-1].isnull() == False).cumsum().iloc[::-1] == 0
trail_lead_nas = top_na | trail_na

Any ideas how this could be expressed more efficiently?

Answer:

%timeit df.ffill().isna() | df.bfill().isna()
The slowest run took 29.24 times longer than the fastest. This could mean that 
an intermediate result is being cached.
31 ms ± 25.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit ((df.isnull() == False).cumsum() == 0) | ((df.iloc[::-1].isnull() ==False).cumsum().iloc[::-1] == 0)
255 ms ± 66.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
like image 572
MMCM_ Avatar asked Jan 20 '20 09:01

MMCM_


People also ask

How do you remove leading and trailing spaces in pandas DataFrame column?

Remove Both Leading and Trailing Whitespace CharactersUsing the strip () function, you can also remove both the leading and trailing whitespace characters from a column using the strip() function.

Which method is used in pandas to detect null values?

In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.


1 Answers

How about this

df.ffill().isna() | df.bfill().isna()

Out[769]:
       a      b      c
0  False   True  False
1  False  False  False
2  False   True  False

df = pd.concat([df] * 1000, ignore_index=True)

In [134]: %%timeit
     ...: lead_na = (df.isnull() == False).cumsum() == 0
     ...: trail_na = (df.iloc[::-1].isnull() == False).cumsum().iloc[::-1] == 0
     ...: trail_lead_nas = lead_na | trail_na
     ...: 
11.8 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [135]: %%timeit
     ...: df.ffill().isna() | df.bfill().isna()
     ...: 
2.1 ms ± 50 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
like image 99
Andy L. Avatar answered Oct 08 '22 19:10

Andy L.