I had a question about equality comparison with numpy and arrays of strings. Say I define the following array:
x = np.array(['yes', 'no', 'maybe'])
Then I can test for equality with other strings and it does element wise comparison with the single string (following, I think, the broadcasting rules here: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html ?):
'yes' == x
#op : array([ True, False, False], dtype=bool)
x == 'yes'
#op : array([ True, False, False], dtype=bool)
However, if I compare with unicode strings I get different behaviour with element wise comparison only happening if I compare the array to the string and only a single comparison being made if I compare the string to the array.
x == u'yes'
#op : array([ True, False, False], dtype=bool)
u'yes' == x
#op : False
I can't find details of this behaviour in the numpy docs and was hoping someone could explain or point me to details of why comparison with unicode strings behaves differently?
To perform element-wise comparison of two string arrays using a comparison operator, use the numpy.compare_chararrays () method in Python Numpy. The arr1 and arr2 are the two input string arrays of the same shape to be compared. The 3rd parameter is rstrip, if True, the spaces at the end of Strings are removed before the comparison.
The numpy.char module provides a set of vectorized string operations for arrays of type numpy.str_ or numpy.bytes_ . All of them are based on the string methods in the Python standard library. Return element-wise string concatenation for two arrays of str or unicode. Return (a * i), that is string multiple concatenation, element-wise.
The numpy.char module provides a set of vectorized string operations for arrays of type numpy.str_ or numpy.bytes_ . All of them are based on the string methods in the Python standard library.
numpy.count () : This function returns the number of occurrences of a substring in the given string. numpy.rfind () : This function returns the highest index of the substring if found in given string. If not found then it returns -1.
The relevant piece of information is this part of the Python's coercion rules:
For objects
x
andy
, firstx.__op__(y)
is tried. If this is not implemented or returnsNotImplemented
,y.__rop__(x)
is tried.
Using your numpy array x
, when the left-hand side is a str
('yes' == x
):
'yes'.__eq__(x)
returns NotImplemented
andx.__eq__('yes')
– resulting in numpy's element-wise comparison.However, when the left-hand side is a unicode
(u'yes' == x
):
u'yes'.__eq__(x)
simply returns False
.The reason for the different __eq__
behaviours is that str.__eq__()
simply returns NotImplemented
if its argument is not a str
type, whereas unicode.__eq__()
first tries to convert its argument to a unicode
, and only returns NotImplemented
if that conversion fails. In this case, the numpy array is convertible to a unicode
: u'yes' == x
is essentially u'yes' == unicode(x)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With