I have two dateframe (df1 & df2), i'm trying to figure out how to use conditions from df2 to extract values from df1 and use the extracted values in df2. df1 = values to exact from df2 = conditions for exaction and df where the extracted values are used conditions: <code>df2.HJ = df1HJ & df2.JK = df1 P colum</code> example <code>if df2(df2.HJ = 99 & df2.JK = P3); Ans = 67 (from df1)</code> df1 <pre class="prettyprint"> ╔════╦════╦══════╦══════╦══════╦══════╗ ║ HJ ║ P1 ║ P2 ║ P3 ║ P4 ║ P5 ║ ╠════╬════╬══════╬══════╬══════╬══════╣ ║ 5 ║ 51 ║ 33 ║ 21 ║ 31 ║ 13 ║ ║ 11 ║ 66 ║ 45 ║ 21 ║ 49 ║ 58 ║ ║ 21 ║ 7 ║ 55 ║ 56 ║ 67 ║ 73 ║ ║ 99 ║ 0 ║ 76 ║ 67 ║ 98 ║ 29 ║ ║ 15 ║ 11 ║ 42 ║ 79 ║ 27 ║ 54 ║ ╚════╩════╩══════╩══════╩══════╩══════╝ </pre> df2 <pre class="prettyprint"> ╔════╦════╗ ║ HJ ║ JK ║ ╠════╬════╣ ║ 99 ║ P1 ║ ║ 11 ║ P5 ║ ║ 5 ║ P3 ║ ║ 21 ║ P2 ║ ║ 11 ║ P3 ║ ╚════╩════╝ </pre> expected result for df2 after exaction from df1 <pre class="prettyprint"> ╔════╦════╦═══════╗ ║ HJ ║ JK ║ Ans ║ ╠════╬════╬═══════╣ ║ 99 ║ P1 ║ 0 ║ ║ 11 ║ P5 ║ 58 ║ ║ 5 ║ P3 ║ 21 ║ ║ 21 ║ P2 ║ 55 ║ ║ 11 ║ P3 ║ 21 ║ ╚════╩════╩═══════╝ </pre> code for df1 <pre class="prettyprint"><code>import pandas as pd import numpy as np data = {'HJ':[5,11,21,99,15], 'P1':[51,66,7,0,11] ,'P2':[ 33,45,55 ,76 ,42] ,'P3':[ 21 ,21 ,56 ,67 ,79] ,'P4':[ 31 ,49 ,67 ,98 ,27] ,'P5':[ 13 ,58 ,73 ,29 ,54]} df1 = pd.DataFrame(data) </code></pre> code for df2 <pre class="prettyprint"><code>data = {'HJ':[99,11,5,21,11], 'JK':['P1','P5','P3','P2','P3']} df2 = pd.DataFrame(data) </code></pre> Regards Thank you =========== <h3>Update</h3> @Scott Boston's solution works: <pre class="prettyprint"><code>df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK']) </code></pre> However, a KeyError: 'One or more row labels was not found' appears when there is/are labels not found. Is there any way to overcome this problem?

Use <code>pd.DataFrame.lookup</code> after <code>set_index</code>: <pre class="prettyprint"><code>df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK']) print(df2) </code></pre> Output: <pre class="prettyprint"><code> HJ JK ans 0 99 P1 0 1 11 P5 58 2 5 P3 21 3 21 P2 55 4 11 P3 21 </code></pre> Using lookup, you have to filter your inputs to lookup first: <pre class="prettyprint"><code>df2m = df2[df2['HJ'].isin(df1['HJ']) & df2['JK'].isin(df1.columns)].copy() df2m['ans'] = df1.set_index('HJ').lookup(df2m['HJ'],df2m['JK']) df2.update(df2m) df2m.combine_first(df2) </code></pre>

extracting values from dataframe1 using conditions set in dataframe2 (pandas, python)

Tags:

python

pandas

dataframe

I have two dateframe (df1 & df2), i'm trying to figure out how to use conditions from df2 to extract values from df1 and use the extracted values in df2.

df1 = values to exact from

df2 = conditions for exaction and df where the extracted values are used

conditions: df2.HJ = df1HJ & df2.JK = df1 P colum

example if df2(df2.HJ = 99 & df2.JK = P3); Ans = 67 (from df1)

df1

╔════╦════╦══════╦══════╦══════╦══════╗
║ HJ ║ P1 ║  P2  ║  P3  ║  P4  ║  P5  ║
╠════╬════╬══════╬══════╬══════╬══════╣
║  5 ║ 51 ║  33  ║  21  ║  31  ║  13  ║
║ 11 ║ 66 ║  45  ║  21  ║  49  ║  58  ║
║ 21 ║  7 ║  55  ║  56  ║  67  ║  73  ║
║ 99 ║  0 ║  76  ║  67  ║  98  ║  29  ║
║ 15 ║ 11 ║  42  ║  79  ║  27  ║  54  ║
╚════╩════╩══════╩══════╩══════╩══════╝

df2

╔════╦════╗
║ HJ ║ JK ║
╠════╬════╣
║ 99 ║ P1 ║
║ 11 ║ P5 ║
║  5 ║ P3 ║
║ 21 ║ P2 ║
║ 11 ║ P3 ║
╚════╩════╝

expected result for df2 after exaction from df1

╔════╦════╦═══════╗
║ HJ ║ JK ║  Ans  ║
╠════╬════╬═══════╣
║ 99 ║ P1 ║    0  ║
║ 11 ║ P5 ║   58  ║
║  5 ║ P3 ║   21  ║
║ 21 ║ P2 ║   55  ║
║ 11 ║ P3 ║   21  ║
╚════╩════╩═══════╝

code for df1

import pandas as pd
import numpy as np
data = {'HJ':[5,11,21,99,15],
'P1':[51,66,7,0,11]
,'P2':[ 33,45,55 ,76 ,42]
,'P3':[ 21 ,21 ,56 ,67 ,79]
,'P4':[ 31 ,49 ,67 ,98 ,27]
,'P5':[ 13 ,58 ,73 ,29 ,54]}
df1 = pd.DataFrame(data)

code for df2

data = {'HJ':[99,11,5,21,11],
'JK':['P1','P5','P3','P2','P3']}
df2 = pd.DataFrame(data)

Regards Thank you

===========

Update

@Scott Boston's solution works:

df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])

However, a KeyError: 'One or more row labels was not found' appears when there is/are labels not found. Is there any way to overcome this problem?

729

asked Jun 09 '20 16:06

ManOnTheMoon

1 Answers

Use pd.DataFrame.lookup after set_index:

df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])
print(df2)

Output:

   HJ  JK  ans
0  99  P1    0
1  11  P5   58
2   5  P3   21
3  21  P2   55
4  11  P3   21

Using lookup, you have to filter your inputs to lookup first:

df2m = df2[df2['HJ'].isin(df1['HJ']) & df2['JK'].isin(df1.columns)].copy()

df2m['ans'] = df1.set_index('HJ').lookup(df2m['HJ'],df2m['JK'])

df2.update(df2m)

df2m.combine_first(df2)

answered Oct 07 '22 14:10

Scott Boston

Related questions
                            
                                AWS Lambda, Python, Numpy and others as Layers
                            
                                Gradcam with guided backprop for transfer learning in Tensorflow 2.0
                            
                                ValueError: 2 columns passed, passed data had 1 columns
                            
                                Get indices of elements in tensor a that are present in tensor b
                            
                                Invoking Google Cloud Function from python using service account for authentication
                            
                                Is it possible to test a while True loop with pytest (I try with a timeout)?
                            
                                Simple Python question: Why can't I assign a variable to a sorted list (in place)? [duplicate]
                            
                                Airflow: Unable to access the AWS providers
                            
                                Difference between Numpy and Tensorflow? [closed]
                            
                                How to handle exception and exit?
                            
                                How to sort a tensor by first dimension in pytorch?
                            
                                Machine learning regression model predicts same value for every image
                            
                                Convert cx_Oracle.LOB data to string in python
                            
                                Why does time.sleep(...) not get affected by the GIL?
                            
                                Groupby a part of the string in pandas
                            
                                Finding highest n values of every column in dataframe [duplicate]
                            
                                Can you make Python3 give an error when comparing strings to bytes
                            
                                What is the unit in python lru_cache?
                            
                                SystemError: <built-in function putText> returned NULL without setting an error
                            
                                Use numpy to quickly iterate over pixels

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With