Consider this simple setup:
x = pd.Series([1, 2, 3], index=list('abc')) y = pd.Series([2, 3, 3], index=list('bca')) x a 1 b 2 c 3 dtype: int64 y b 2 c 3 a 3 dtype: int64
As you can see, the indexes are the same, just in a different order.
Now, consider a simple logical comparison using the equality (==
) operator:
x == y --------------------------------------------------------------------------- ValueError Traceback (most recent call last)
This throws a ValueError
, most likely because the indexes do not match. On the other hand, calling the equivalent eq
operator works:
x.eq(y) a False b True c True dtype: bool
OTOH, the operator method works given y
is first reordered...
x == y.reindex_like(x) a False b True c True dtype: bool
My understanding was that the function and operator comparison should do the same thing, all other things equal. What is eq
doing that the operator comparison doesn't?
In Pandas, Indexes are immutable like dictionary keys. They also assume homogeneity in data type like NumPy arrays. Before we get into the thick of all these, let us quickly remind ourselves of how to create Series, Indexes, and try to modify index names and get into our lesson for the day.
The column you want to index does not need to have unique values.
IndexError is an exception in python that occurs when we try to access an element from a list or tuple from an index that is not present in the list. For example, we have a list of 10 elements, the index is in the range 0 to 9.
It is possible to specify or change the index labels of a pandas Series object after creation also. It can be done by using the index attribute of the pandas series constructor.
Viewing the whole traceback for a Series comparison with mismatched indexes, particularly focusing on the exception message:
In [1]: import pandas as pd In [2]: x = pd.Series([1, 2, 3], index=list('abc')) In [3]: y = pd.Series([2, 3, 3], index=list('bca')) In [4]: x == y --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-4-73b2790c1e5e> in <module>() ----> 1 x == y /usr/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis) 1188 1189 elif isinstance(other, ABCSeries) and not self._indexed_same(othe r): -> 1190 raise ValueError("Can only compare identically-labeled " 1191 "Series objects") 1192 ValueError: Can only compare identically-labeled Series objects
we see that this is a deliberate implementation decision. Also, this is not unique to Series objects - DataFrames raise a similar error.
Digging through the Git blame for the relevant lines eventually turns up some relevant commits and issue tracker threads. For example, Series.__eq__
used to completely ignore the RHS's index, and in a comment on a bug report about that behavior, Pandas author Wes McKinney says the following:
This is actually a feature / deliberate choice and not a bug-- it's related to #652. Back in January I changed the comparison methods to do auto-alignment, but found that it led to a large amount of bugs / breakage for users and, in particular, many NumPy functions (which regularly do things like
arr[1:] == arr[:-1]
; example:np.unique
) stopped working.This gets back to the issue that Series isn't quite ndarray-like enough and should probably not be a subclass of ndarray.
So, I haven't got a good answer for you except for that; auto-alignment would be ideal but I don't think I can do it unless I make Series not a subclass of ndarray. I think this is probably a good idea but not likely to happen until 0.9 or 0.10 (several months down the road).
This was then changed to the current behavior in pandas 0.19.0. Quoting the "what's new" page:
Following Series operators have been changed to make all operators consistent, including DataFrame (GH1134, GH4581, GH13538)
- Series comparison operators now raise ValueError when index are different.
- Series logical operators align both index of left and right hand side.
This made the Series behavior match that of DataFrame, which already rejected mismatched indices in comparisons.
In summary, making the comparison operators align indices automatically turned out to break too much stuff, so this was the best alternative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With