Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

new column in pandas dataframe failed to get the expected values basis if conditions on multiple columns

I have a pandas dataframe that has data as in the below table:

Negative  Positive  Neutral
True      False     False
True      False     False
False     False     True
False     True      False
True      False     False
False     True      False
True      False     False
True      False     False

What I am doing is to create a new column ("Overall") and depending upon conditions that if the row value for column "Positive" is True, Overall column gets the value "Positive", if Column "Negative" is True, then Overall will take "Negative" otherwise "Neutral" values:

def flag_df(df):
    if (df["Negative"] == "True") and (df["Positive"] == "False") and (df["Neutral"] == "False"):
        return "Negative"
    elif (df["Negative"] == "False") and (df["Positive"] == "True") and (df["Neutral"] == "False"):
        return "Positive"
    else :
        return "Neutral"

fdf['Overall'] = fdf.apply(flag_df, axis = 1)

but unfortunately, I don't know what I did wrong, all the observations in "Overall" column is coming out to be "Neutral":

Negative    Positive    Neutral     Overall
True           False     False      Neutral
True           False     False      Neutral
False          False     True       Neutral
False          True      False      Neutral
True           False     False      Neutral
False          True      False      Neutral
True           False     False      Neutral
True           False     False      Neutral

Can someone please let me know where I did wrong?

like image 806
loveR Avatar asked Dec 06 '25 05:12

loveR


2 Answers

If all columns are boolean and always only one True per row is possible use DataFrame.dot:

print (df.dtypes)
Negative    bool
Positive    bool
Neutral     bool
dtype: object

df['Overall'] = df.dot(df.columns)
print (df)
   Negative  Positive  Neutral   Overall
0      True     False    False  Negative
1      True     False    False  Negative
2     False     False     True   Neutral
3     False      True    False  Positive
4      True     False    False  Negative
5     False      True    False  Positive
6      True     False    False  Negative
7      True     False    False  Negative

If multiple columns names use:

cols = ['Negative', 'Positive', 'Neutral']
df['Overall'] = df[cols].dot(pd.Index(cols))

Or:

df1 = df[cols]
df['Overall'] = df1.dot(df1.columns)

Your solution should be changed by numpy.select:

m1 = df["Negative"] & ~df["Positive"] & ~df["Neutral"]
m2 = ~df["Negative"] & df["Positive"] & ~df["Neutral"]

df['Overall'] = np.select([m1, m2], ['Negative','Positive'], default='Neutral')
print (df)
   Negative  Positive  Neutral   Overall
0      True     False    False  Negative
1      True     False    False  Negative
2     False     False     True   Neutral
3     False      True    False  Positive
4      True     False    False  Negative
5     False      True    False  Positive
6      True     False    False  Negative
7      True     False    False  Negative

If possible multiple match - Trues per rows is possible add separator to columns anmes and then remove last ,:

print (df)
   Negative  Positive  Neutral
0      True     False     True
1      True     False    False
2     False     False     True
3     False      True    False
4      True     False    False
5     False      True    False
6      True     False    False
7      True     False    False


df['Overall'] = df.dot(df.columns + ',').str.rstrip(',')
print (df)
   Negative  Positive  Neutral           Overall
0      True     False     True  Negative,Neutral
1      True     False    False          Negative
2     False     False     True           Neutral
3     False      True    False          Positive
4      True     False    False          Negative
5     False      True    False          Positive
6      True     False    False          Negative
7      True     False    False          Negative
like image 78
jezrael Avatar answered Dec 08 '25 18:12

jezrael


IIUC, you could use idxmax:

df['Overall'] = df.idxmax(axis=1)
print(df)

Output

   Negative  Positive  Neutral   Overall
0      True     False    False  Negative
1      True     False    False  Negative
2     False     False     True   Neutral
3     False      True    False  Positive
4      True     False    False  Negative
5     False      True    False  Positive
6      True     False    False  Negative
7      True     False    False  Negative
like image 40
Dani Mesejo Avatar answered Dec 08 '25 19:12

Dani Mesejo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!