In pandas, I'd like to create a computed column that's a boolean operation on two other columns.
In pandas, it's easy to add together two numerical columns. I'd like to do something similar with logical operator AND
. Here's my first try:
In [1]: d = pandas.DataFrame([{'foo':True, 'bar':True}, {'foo':True, 'bar':False}, {'foo':False, 'bar':False}]) In [2]: d Out[2]: bar foo 0 True True 1 False True 2 False False In [3]: d.bar and d.foo ## can't ... ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So I guess logical operators don't work quite the same way as numeric operators in pandas. I tried doing what the error message suggests and using bool()
:
In [258]: d.bar.bool() and d.foo.bool() ## spoiler: this doesn't work either ... ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I found a way that works by casting the boolean columns to int
, adding them together and evaluating as a boolean.
In [4]: (d.bar.apply(int) + d.foo.apply(int)) > 0 ## Logical OR Out[4]: 0 True 1 True 2 False dtype: bool In [5]: (d.bar.apply(int) + d.foo.apply(int)) > 1 ## Logical AND Out[5]: 0 True 1 False 2 False dtype: bool
This is convoluted. Is there a better way?
The operators are: | for or , & for and , and ~ for not . These must be grouped by using parentheses, since by default Python will evaluate an expression such as df. A > 2 & df. B < 3 as df.
To get all combinations of columns we will be using itertools. product module. This function computes the cartesian product of input iterables. To compute the product of an iterable with itself, we use the optional repeat keyword argument to specify the number of repetitions.
Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.
Yes there is a better way! Just use the &
element-wise logical and operator:
d.bar & d.foo 0 True 1 False 2 False dtype: bool
Also, there exists another one you could just multiply for AND or add for OR. Without the conversion and extra comparison as you had done.
AND operation:
d.foo * d.bar
OR operation:
d.foo + d.bar
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With