Greater/less than comparisons between Pandas DataFrames/Series

Question

How can I perform comparisons between DataFrames and Series? I'd like to mask elements in a DataFrame/Series that are greater/less than elements in another DataFrame/Series.

For instance, the following doesn't replace elements greater than the mean with nans although I was expecting it to:

>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]})
>>> x[x > x.mean(axis=1)] = np.nan
>>> x
   a  b
0  1  3
1  2  4

If we look at the boolean array created by the comparison, it is really weird:

>>> x = pd.DataFrame(data={'a': [1, 2], 'b': [3, 4]})
>>> x > x.mean(axis=1)
       a      b      0      1
0  False  False  False  False
1  False  False  False  False

I don't understand by what logic the resulting boolean array is like that. I'm able to work around this problem by using transpose:

>>> (x.T > x.mean(axis=1).T).T
       a     b
0  False  True
1  False  True

But I believe there is some "correct" way of doing this that I'm not aware of. And at least I'd like to understand what is going on.

EdChum · Accepted Answer

The problem here is that it's interpreting the index as column values to perform the comparison, if you use .gt and pass axis=0 then you get the result you desire:

In [203]:
x.gt(x.mean(axis=1), axis=0)

Out[203]:
       a     b
0  False  True
1  False  True

You can see what I mean when you perform the comparison with the np array:

In [205]:
x > x.mean(axis=1).values

Out[205]:
       a      b
0  False  False
1  False   True

here you can see that the default axis for comparison is on the column, resulting in a different result

Greater/less than comparisons between Pandas DataFrames/Series

Tags:

python

pandas

Jaakko Luttinen

1 Answers

EdChum

Recent Activity

Donate For Us

Greater/less than comparisons between Pandas DataFrames/Series

Tags:

python

pandas

Jaakko Luttinen

1 Answers

EdChum

Related questions

Recent Activity

Donate For Us