I have two dateframe (df1 & df2), i'm trying to figure out how to use conditions from df2 to extract values from df1 and use the extracted values in df2.
df1 = values to exact from
df2 = conditions for exaction and df where the extracted values are used
conditions: df2.HJ = df1HJ & df2.JK = df1 P colum
example if df2(df2.HJ = 99 & df2.JK = P3); Ans = 67 (from df1)
df1
╔════╦════╦══════╦══════╦══════╦══════╗ ║ HJ ║ P1 ║ P2 ║ P3 ║ P4 ║ P5 ║ ╠════╬════╬══════╬══════╬══════╬══════╣ ║ 5 ║ 51 ║ 33 ║ 21 ║ 31 ║ 13 ║ ║ 11 ║ 66 ║ 45 ║ 21 ║ 49 ║ 58 ║ ║ 21 ║ 7 ║ 55 ║ 56 ║ 67 ║ 73 ║ ║ 99 ║ 0 ║ 76 ║ 67 ║ 98 ║ 29 ║ ║ 15 ║ 11 ║ 42 ║ 79 ║ 27 ║ 54 ║ ╚════╩════╩══════╩══════╩══════╩══════╝
df2
╔════╦════╗ ║ HJ ║ JK ║ ╠════╬════╣ ║ 99 ║ P1 ║ ║ 11 ║ P5 ║ ║ 5 ║ P3 ║ ║ 21 ║ P2 ║ ║ 11 ║ P3 ║ ╚════╩════╝
expected result for df2 after exaction from df1
╔════╦════╦═══════╗ ║ HJ ║ JK ║ Ans ║ ╠════╬════╬═══════╣ ║ 99 ║ P1 ║ 0 ║ ║ 11 ║ P5 ║ 58 ║ ║ 5 ║ P3 ║ 21 ║ ║ 21 ║ P2 ║ 55 ║ ║ 11 ║ P3 ║ 21 ║ ╚════╩════╩═══════╝
code for df1
import pandas as pd
import numpy as np
data = {'HJ':[5,11,21,99,15],
'P1':[51,66,7,0,11]
,'P2':[ 33,45,55 ,76 ,42]
,'P3':[ 21 ,21 ,56 ,67 ,79]
,'P4':[ 31 ,49 ,67 ,98 ,27]
,'P5':[ 13 ,58 ,73 ,29 ,54]}
df1 = pd.DataFrame(data)
code for df2
data = {'HJ':[99,11,5,21,11],
'JK':['P1','P5','P3','P2','P3']}
df2 = pd.DataFrame(data)
Regards Thank you
===========
@Scott Boston's solution works:
df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])
However, a KeyError: 'One or more row labels was not found' appears when there is/are labels not found. Is there any way to overcome this problem?
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.
Select Data Using Location Index (. This means that you can use dataframe. iloc[0:1, 0:1] to select the cell value at the intersection of the first row and first column of the dataframe. You can expand the range for either the row index or column index to select more data.
Use pd.DataFrame.lookup
after set_index
:
df2['ans'] = df1.set_index('HJ').lookup(df2['HJ'], df2['JK'])
print(df2)
Output:
HJ JK ans
0 99 P1 0
1 11 P5 58
2 5 P3 21
3 21 P2 55
4 11 P3 21
Using lookup, you have to filter your inputs to lookup first:
df2m = df2[df2['HJ'].isin(df1['HJ']) & df2['JK'].isin(df1.columns)].copy()
df2m['ans'] = df1.set_index('HJ').lookup(df2m['HJ'],df2m['JK'])
df2.update(df2m)
df2m.combine_first(df2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With