I have two dataframes df1
df2
with the same numbers of rows and columns and variables, and I'm trying to compare the boolean variable choice
in the two dataframes. Then use if/else
to manipulate the data. But something seems wrong when I try to compare the boolean var.
Here are my dataframes sample and codes:
#df1
v_100 choice #boolean
7 True
0 True
7 False
2 True
#df2
v_100 choice #boolean
1 False
2 True
74 True
6 True
def lastTwoTrials_outcome():
df1 = df.iloc[5::6, :] #df1 and df2 are extracted from the same dataframe first
df2 = df.iloc[4::6, :]
if df1['choice'] != df2['choice']: # if "choice" is different in the two dataframes
df1['v_100'] = (df1['choice'] + df2['choice']) * 0.5
Here's the error:
if df1['choice'] != df2['choice']:
File "path", line 818, in wrapper
raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects
I found the same error here, and an answer suggests to sort_index
first, but I don't really understand why though? Can anyone explain more in detail please (if that's the correct solution)?
Thanks!
The error happens because you compare two pandas.Series objects with different indices. A simple solution would be to compare just the values in the series. Try it:
if df1['choice'].values != df2['choice'].values
I think you need reset_index
for same index values and then comapare - for create new column is better use mask
or numpy.where
:
Also instead +
use |
because working with booleans.
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] + df2['choice']) * 0.5)
df1['v_100'] = np.where(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5,
df1['choice'])
Samples:
print (df1)
v_100 choice
5 7 True
6 0 True
7 7 False
8 2 True
print (df2)
v_100 choice
4 1 False
5 2 True
6 74 True
7 6 True
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print (df1)
v_100 choice
0 7 True
1 0 True
2 7 False
3 2 True
print (df2)
v_100 choice
0 1 False
1 2 True
2 74 True
3 6 True
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5)
print (df1)
v_100 choice
0 0.5 True
1 1.0 True
2 0.5 False
3 1.0 True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With