Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently search for combinations of list values in a pandas DataFrame

I'm iterating over multiple pandas data frames where on each iteration I extract 2 lists. Then, for each 2 lists combination, I need to find the row in another Data Frame where each combination of all values in the 2 lists appears.

Simplified example:

The df looks like this (any D_val P_val may appear only once):

D_val | P_val | D_PCode  
A     | 1     | 25  
D     | 1     | 21  
C     | 2     | 32  
B     | 4     | 35  
D     | 2     | 126  
B     | 1     | 3  
C     | 1     | 312  

For a single iteration with the following lists -

list1 = [1,2]  
list2 = [A,B,C]  

I expect to get the following list:
[25, 3, 312, 32]

Explanation:

for [A,1] - 25  
for [A,2] - nothing  
For [B,1] - 3  

etc.

I implemented it using nested loops, but with multiple iterations, large data frame and long lists it takes far too long.

Any suggestions how to efficiently implement it? I'm happy to use other data structures if needed.

like image 800
Ganitph Avatar asked Jan 31 '26 13:01

Ganitph


2 Answers

We just using isin twice

df1=df.loc[df.D_val.isin(list2)&df.P_val.isin(list1)]
df1
Out[211]: 
  D_val  P_val  D_PCode
0     A      1       25
2     C      2       32
5     B      1        3
6     C      1      312
like image 158
BENY Avatar answered Feb 03 '26 07:02

BENY


You can use itertools.product to generate all possible values, then use isin:

from itertools import product

import pandas as pd

data = [['A', 1, 25],
        ['D', 1, 21],
        ['C', 2, 32],
        ['B', 4, 35],
        ['D', 2, 126],
        ['B', 1, 3],
        ['C', 1, 312]]

df = pd.DataFrame(data=data, columns=['D_val', 'P_val', 'D_PCode'])

list1 = [1, 2]
list2 = ['A', 'B', 'C']
lookup = set(product(list2, list1))

mask = df[['D_val', 'P_val']].apply(tuple, axis=1).isin(lookup)
result = df[mask].D_PCode

print(result)

Output

0     25
2     32
5      3
6    312
Name: D_PCode, dtype: int64

Or just use isin directly on both columns, for instance:

list1 = [1, 2]
list2 = ['A', 'B', 'C']

result = df[df.D_val.isin(list2) & df.P_val.isin(list1)].D_PCode
print(result)

Output

0     25
2     32
5      3
6    312
Name: D_PCode, dtype: int64
like image 32
Dani Mesejo Avatar answered Feb 03 '26 08:02

Dani Mesejo