I'm iterating over multiple pandas data frames where on each iteration I extract 2 lists. Then, for each 2 lists combination, I need to find the row in another Data Frame where each combination of all values in the 2 lists appears.
Simplified example:
The df looks like this (any D_val P_val may appear only once):
D_val | P_val | D_PCode
A | 1 | 25
D | 1 | 21
C | 2 | 32
B | 4 | 35
D | 2 | 126
B | 1 | 3
C | 1 | 312
For a single iteration with the following lists -
list1 = [1,2]
list2 = [A,B,C]
I expect to get the following list:
[25, 3, 312, 32]
Explanation:
for [A,1] - 25
for [A,2] - nothing
For [B,1] - 3
etc.
I implemented it using nested loops, but with multiple iterations, large data frame and long lists it takes far too long.
Any suggestions how to efficiently implement it? I'm happy to use other data structures if needed.
We just using isin twice
df1=df.loc[df.D_val.isin(list2)&df.P_val.isin(list1)]
df1
Out[211]:
D_val P_val D_PCode
0 A 1 25
2 C 2 32
5 B 1 3
6 C 1 312
You can use itertools.product to generate all possible values, then use isin:
from itertools import product
import pandas as pd
data = [['A', 1, 25],
['D', 1, 21],
['C', 2, 32],
['B', 4, 35],
['D', 2, 126],
['B', 1, 3],
['C', 1, 312]]
df = pd.DataFrame(data=data, columns=['D_val', 'P_val', 'D_PCode'])
list1 = [1, 2]
list2 = ['A', 'B', 'C']
lookup = set(product(list2, list1))
mask = df[['D_val', 'P_val']].apply(tuple, axis=1).isin(lookup)
result = df[mask].D_PCode
print(result)
Output
0 25
2 32
5 3
6 312
Name: D_PCode, dtype: int64
Or just use isin directly on both columns, for instance:
list1 = [1, 2]
list2 = ['A', 'B', 'C']
result = df[df.D_val.isin(list2) & df.P_val.isin(list1)].D_PCode
print(result)
Output
0 25
2 32
5 3
6 312
Name: D_PCode, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With