How to automatically group conditions together in python?

Question

I'm trying to automatically group conditions together in python. The difficulty lies in that if there are several conditions, like 100 conditions, it would be tedious to "AND" all of these by hand. How can I use a loop to achieve this?

import pandas as pd
s1 = pd.Series([1,2,3,4,5,6])
s2 = pd.Series([5,6,7,8,9,10])
s3 = pd.Series([11,12,5,7,8,2])
df = pd.DataFrame({'A': s1,'B': s2,'C': s3})

condition1 = df['A'] > 3
condition2 = df['B'] > 6
condition3 = df['C'] > 5
# AND Operation ->>> Can be achieved with a loop?
select = condition1 & condition2 & condition3

jpp · Accepted Answer

A couple of points to note:

You can use a list or dictionary to store a variable number of variables.
Your Boolean series function as NumPy arrays; you can therefore combine a collection of series and use np.ndarray.all (or pd.DataFrame.all) to calculate their intersection.

You can use a list of Boolean series with NumPy or Pandas:

conditions = [df['A'] > 3,
              df['B'] > 6,
              df['C'] > 5]

# all equivalent
select = pd.concat(conditions, axis=1).all(axis=1)
select = np.logical_and.reduce(conditions)
select = np.array(conditions).all(axis=0)

print(select)

array([False, False, False,  True,  True, False], dtype=bool)

Equivalently, if you wish to name your Boolean filters, you can use a dictionary:

conditions = {1: df['A'] > 3,
              2: df['B'] > 6,
              3: df['C'] > 5}

select = np.array(list(conditions.values())).all(axis=0)

Performance benchmarking

Performance will be very data-dependent, you should also try reduce as per @Kopytok's solution and check performance with your data.

df = pd.concat([df]*1000)

conditions = [df['A'] > 3,
              df['B'] > 6,
              df['C'] > 5]

conditions = conditions*100

%timeit reduce(lambda x, y: x & y, conditions)     # 104 ms per loop
%timeit np.logical_and.reduce(conditions)          # 104 ms per loop
%timeit np.array(conditions).all(axis=0)           # 99.4 ms per loop
%timeit pd.concat(conditions, axis=1).all(axis=1)  # 34.6 ms per loop

koPytok · Answer

You can achive it by creating a list of conditions and using reduce:

from functools import reduce

conditions = [
    df['A'] > 3,
    df['B'] > 6,
    df['C'] > 5,
]

total_condition = reduce(lambda x, y: x & y, conditions)

Test case:

d = pd.DataFrame(np.random.randint(1, 5, (700000, 3)), columns=["a", "b", "c"])

conditions = [
    d["a"] > 2,
    d["c"] > 1,
    d["b"] > 2,
]*100

Using reduce:

from functools import reduce

%timeit reduce(lambda x, y: x & y, conditions)
> 547 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Using np.concat + df.all():

%timeit pd.concat(conditions, axis=1).all(1)
> 4.19 s ± 367 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

How to automatically group conditions together in python?

Tags:

python

pandas

logical-operators

numpy

zonelincosmos

2 Answers

jpp

koPytok

Recent Activity

Donate For Us

How to automatically group conditions together in python?

Tags:

python

pandas

logical-operators

numpy

zonelincosmos

2 Answers

jpp

koPytok

Related questions

Recent Activity

Donate For Us