Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas any() returning false with true values present

I have a largely empty dataframe of poorly formatted dates that I converted into DateTime format.

from io import StringIO

data = StringIO("""issue_date,issue_date_dt
,
,
19600215.0,1960-02-15
,
,""")

df = pd.read_csv(data, parse_dates=[1])

Which produces

    issue_date  issue_date_dt
0   NaN         NaT
1   NaN         NaT
2   19600215.0  1960-02-15
3   NaN         NaT
4   NaN         NaT

I'd expect that I could use df.any() to find whether there was a value in a row or column. axis=0 behaves as expected:

df.any(axis=0)

issue_date       True
issue_date_dt    True
dtype: bool

But axis=1 just returns false for all rows all the time.

df.any(axis=1)

0    False
1    False
2    False
3    False
4    False
dtype: bool
like image 355
jesseWUT Avatar asked Oct 10 '18 03:10

jesseWUT


1 Answers

I'm not entirely sure why this is occuring[1], my best guess is that the differing datatypes along the first axis cause this unexpected result, as any works as expected along axis 0. However, I would argue that the workaround to this is actually a better approach anyways, as it is more immediately clear to a reader what exactly you are checking for.


This could potentially be a bug, if you agree I would recommend opening an issue on the pandas github page.

The workaround is straightforward, make use of notnull to use any on a homogenous mask of type bool, rather than a DataFrame containing mixed types

df.notnull().any(1)

0    False
1    False
2     True
3    False
4    False
dtype: bool

[1] This appears to have been recognized as a bug

like image 71
user3483203 Avatar answered Nov 05 '22 19:11

user3483203