Is there a way to identify leading and trailing NAs in a pandas.DataFrame
Currently I do the following but it seems not straightforward:
import pandas as pd
df = pd.DataFrame(dict(a=[0.1, 0.2, 0.2],
b=[None, 0.1, None],
c=[0.1, None, 0.1])
lead_na = (df.isnull() == False).cumsum() == 0
trail_na = (df.iloc[::-1].isnull() == False).cumsum().iloc[::-1] == 0
trail_lead_nas = top_na | trail_na
Any ideas how this could be expressed more efficiently?
Answer:
%timeit df.ffill().isna() | df.bfill().isna()
The slowest run took 29.24 times longer than the fastest. This could mean that
an intermediate result is being cached.
31 ms ± 25.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit ((df.isnull() == False).cumsum() == 0) | ((df.iloc[::-1].isnull() ==False).cumsum().iloc[::-1] == 0)
255 ms ± 66.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Remove Both Leading and Trailing Whitespace CharactersUsing the strip () function, you can also remove both the leading and trailing whitespace characters from a column using the strip() function.
In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.
How about this
df.ffill().isna() | df.bfill().isna()
Out[769]:
a b c
0 False True False
1 False False False
2 False True False
df = pd.concat([df] * 1000, ignore_index=True)
In [134]: %%timeit
...: lead_na = (df.isnull() == False).cumsum() == 0
...: trail_na = (df.iloc[::-1].isnull() == False).cumsum().iloc[::-1] == 0
...: trail_lead_nas = lead_na | trail_na
...:
11.8 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [135]: %%timeit
...: df.ffill().isna() | df.bfill().isna()
...:
2.1 ms ± 50 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With