Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error"Can only compare identically-labeled Series objects" and sort_index

I have two dataframes df1 df2with the same numbers of rows and columns and variables, and I'm trying to compare the boolean variable choice in the two dataframes. Then use if/else to manipulate the data. But something seems wrong when I try to compare the boolean var.

Here are my dataframes sample and codes:

#df1
v_100     choice #boolean
7          True
0          True
7          False
2          True

#df2
v_100     choice #boolean
1          False
2          True
74         True
6          True

def lastTwoTrials_outcome():
     df1 = df.iloc[5::6, :] #df1 and df2 are extracted from the same dataframe first
     df2 = df.iloc[4::6, :]

     if df1['choice'] != df2['choice']:  # if "choice" is different in the two dataframes
         df1['v_100'] = (df1['choice'] + df2['choice']) * 0.5

Here's the error:

if df1['choice'] != df2['choice']:
File "path", line 818, in wrapper
raise ValueError(msg)
ValueError: Can only compare identically-labeled Series objects

I found the same error here, and an answer suggests to sort_index first, but I don't really understand why though? Can anyone explain more in detail please (if that's the correct solution)?

Thanks!

like image 726
Lumos Avatar asked Jun 27 '17 05:06

Lumos


2 Answers

The error happens because you compare two pandas.Series objects with different indices. A simple solution would be to compare just the values in the series. Try it:

if df1['choice'].values != df2['choice'].values
like image 165
Poe Dator Avatar answered Oct 20 '22 01:10

Poe Dator


I think you need reset_index for same index values and then comapare - for create new column is better use mask or numpy.where:

Also instead + use | because working with booleans.

df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
                                  (df1['choice'] + df2['choice']) * 0.5)


df1['v_100'] = np.where(df1['choice'] != df2['choice'],
                       (df1['choice'] | df2['choice']) * 0.5,
                        df1['choice'])

Samples:

print (df1)
   v_100  choice
5      7    True
6      0    True
7      7   False
8      2    True

print (df2)
   v_100  choice
4      1   False
5      2    True
6     74    True
7      6    True

df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print (df1)
   v_100  choice
0      7    True
1      0    True
2      7   False
3      2    True

print (df2)
   v_100  choice
0      1   False
1      2    True
2     74    True
3      6    True

df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
                                  (df1['choice'] | df2['choice']) * 0.5)

print (df1)
   v_100  choice
0    0.5    True
1    1.0    True
2    0.5   False
3    1.0    True
like image 41
jezrael Avatar answered Oct 19 '22 23:10

jezrael