Efficiently search for combinations of list values in a pandas DataFrame

Question

I'm iterating over multiple pandas data frames where on each iteration I extract 2 lists. Then, for each 2 lists combination, I need to find the row in another Data Frame where each combination of all values in the 2 lists appears.

Simplified example:

The df looks like this (any D_val P_val may appear only once):

D_val | P_val | D_PCode  
A     | 1     | 25  
D     | 1     | 21  
C     | 2     | 32  
B     | 4     | 35  
D     | 2     | 126  
B     | 1     | 3  
C     | 1     | 312

For a single iteration with the following lists -

list1 = [1,2]  
list2 = [A,B,C]

I expect to get the following list:
[25, 3, 312, 32]

Explanation:

for [A,1] - 25  
for [A,2] - nothing  
For [B,1] - 3

etc.

I implemented it using nested loops, but with multiple iterations, large data frame and long lists it takes far too long.

Any suggestions how to efficiently implement it? I'm happy to use other data structures if needed.

BENY · Accepted Answer

We just using isin twice

df1=df.loc[df.D_val.isin(list2)&df.P_val.isin(list1)]
df1
Out[211]: 
  D_val  P_val  D_PCode
0     A      1       25
2     C      2       32
5     B      1        3
6     C      1      312

Dani Mesejo · Answer

You can use itertools.product to generate all possible values, then use isin:

from itertools import product

import pandas as pd

data = [['A', 1, 25],
        ['D', 1, 21],
        ['C', 2, 32],
        ['B', 4, 35],
        ['D', 2, 126],
        ['B', 1, 3],
        ['C', 1, 312]]

df = pd.DataFrame(data=data, columns=['D_val', 'P_val', 'D_PCode'])

list1 = [1, 2]
list2 = ['A', 'B', 'C']
lookup = set(product(list2, list1))

mask = df[['D_val', 'P_val']].apply(tuple, axis=1).isin(lookup)
result = df[mask].D_PCode

print(result)

Output

0     25
2     32
5      3
6    312
Name: D_PCode, dtype: int64

Or just use isin directly on both columns, for instance:

list1 = [1, 2]
list2 = ['A', 'B', 'C']

result = df[df.D_val.isin(list2) & df.P_val.isin(list1)].D_PCode
print(result)

Output

0     25
2     32
5      3
6    312
Name: D_PCode, dtype: int64

Efficiently search for combinations of list values in a pandas DataFrame

Tags:

python

list

pandas

dataframe

combinations

Ganitph

2 Answers

BENY

Dani Mesejo

Recent Activity

Donate For Us

Efficiently search for combinations of list values in a pandas DataFrame

Tags:

python

list

pandas

dataframe

combinations

Ganitph

2 Answers

BENY

Dani Mesejo

Related questions

Recent Activity

Donate For Us