Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate a new column based on other columns' value

Tags:

pandas

here is my sample data input and output:

df=pd.DataFrame({'A_flag': [1, 1,1], 'B_flag': [1, 1,0],'C_flag': [0, 1,0],'A_value': [5, 3,7], 'B_value': [2, 7,4],'C_value': [4, 2,5]})

df1=pd.DataFrame({'A_flag': [1, 1,1], 'B_flag': [1, 1,0],'C_flag': [0, 1,0],'A_value': [5, 3,7], 'B_value': [2, 7,4],'C_value': [4, 2,5], 'Final':[3.5,3,7]})

I want to generate another column called 'Final' conditional on A_flag, B_flag and C_flag:

(a) If number of three columns equal to 1 is 3, then 'Final'=median of (A_value, B_value, C_value)

(b) If the number of satisfied conditions is 2, then 'Final'= mean of those two

(c) If the number is 1, the 'Final' = that one

For example, in row 1, A_flag=1 and B_flag =1, 'Final'=A_value+B_value/2=5+2/2=3.5 in row 2, all three flags are 1 so 'Final'= median of (3,7,2) =3 in row 3, only A_flag=1, so 'Final'=A_value=7

I tried the following:

df.loc[df[['A_flag','B_flag','C_flag']].eq(1).sum(axis=1)==3, "Final"]= df[['A_flag','B_flag','C_flag']].median(axis=1)

df.loc[df[['A_flag','B_flag','C_flag']].eq(1).sum(axis=1)==2, "Final"]=
df.loc[df[['A_flag','B_flag','C_flag']].eq(1).sum(axis=1)==1, "Final"]=  

I don't know how to subset the columns that for the second and third scenarios.

like image 611
Derek Avatar asked Jan 20 '26 20:01

Derek


2 Answers

Assuming the order of flag and value columns match, you can first filter the flag and value like columns then mask the values in value columns where flag is 0, then calculate median along axis=1

flag = df.filter(like='_flag')
value = df.filter(like='_value')

df['median'] = value.mask(flag.eq(0).to_numpy()).median(1)

   A_flag  B_flag  C_flag  A_value  B_value  C_value  median
0       1       1       0        5        2        4     3.5
1       1       1       1        3        7        2     3.0
2       1       0       0        7        4        5     7.0
like image 59
Shubham Sharma Avatar answered Jan 23 '26 20:01

Shubham Sharma


When dealing with functions and dataframe, usually the easiest way to go is defining a function and then calling that function to the dataframe either by iterating over the columns or the rows. I think in your case this might work:

import pandas as pd

df = pd.DataFrame(
    {
        "A_flag": [1, 1, 1],
        "B_flag": [1, 1, 0],
        "C_flag": [0, 1, 0],
        "A_value": [5, 3, 7],
        "B_value": [2, 7, 4],
        "C_value": [4, 2, 5],
    }
)

def make_final_column(row):
    flags = [(row['A_flag'], row['A_value']), (row['B_flag'], row['B_value']), (row['C_flag'], row['C_value'])]
    met_condition = [row[1] for row in flags if row[0] == 1]
    return sum(met_condition) / len(met_condition)


df["Final"] = df.apply(make_final_column, axis=1)
df
like image 35
Kushim Avatar answered Jan 23 '26 21:01

Kushim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!