I have a largely empty dataframe of poorly formatted dates that I converted into DateTime format.
from io import StringIO
data = StringIO("""issue_date,issue_date_dt
,
,
19600215.0,1960-02-15
,
,""")
df = pd.read_csv(data, parse_dates=[1])
Which produces
issue_date issue_date_dt
0 NaN NaT
1 NaN NaT
2 19600215.0 1960-02-15
3 NaN NaT
4 NaN NaT
I'd expect that I could use df.any() to find whether there was a value in a row or column. axis=0
behaves as expected:
df.any(axis=0)
issue_date True
issue_date_dt True
dtype: bool
But axis=1
just returns false for all rows all the time.
df.any(axis=1)
0 False
1 False
2 False
3 False
4 False
dtype: bool
I'm not entirely sure why this is occuring[1], my best guess is that the differing datatypes along the first axis cause this unexpected result, as any
works as expected along axis 0
. However, I would argue that the workaround to this is actually a better approach anyways, as it is more immediately clear to a reader what exactly you are checking for.
This could potentially be a bug, if you agree I would recommend opening an issue on the pandas
github page.
The workaround is straightforward, make use of notnull
to use any
on a homogenous mask of type bool
, rather than a DataFrame containing mixed types
df.notnull().any(1)
0 False
1 False
2 True
3 False
4 False
dtype: bool
[1] This appears to have been recognized as a bug
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With