I had a question about equality comparison with numpy and arrays of strings. Say I define the following array: <pre class="prettyprint"><code>x = np.array(['yes', 'no', 'maybe']) </code></pre> Then I can test for equality with other strings and it does element wise comparison with the single string (following, I think, the broadcasting rules here: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html ?): <pre class="prettyprint"><code>'yes' == x #op : array([ True, False, False], dtype=bool) x == 'yes' #op : array([ True, False, False], dtype=bool) </code></pre> However, if I compare with unicode strings I get different behaviour with element wise comparison only happening if I compare the array to the string and only a single comparison being made if I compare the string to the array. <pre class="prettyprint"><code>x == u'yes' #op : array([ True, False, False], dtype=bool) u'yes' == x #op : False </code></pre> I can't find details of this behaviour in the numpy docs and was hoping someone could explain or point me to details of why comparison with unicode strings behaves differently?

The relevant piece of information is this part of the Python's coercion rules: <blockquote> For objects <code>x</code>and <code>y</code>, first <code>x.__op__(y)</code> is tried. If this is not implemented or returns <code>NotImplemented</code>, <code>y.__rop__(x)</code> is tried. </blockquote> Using your numpy array <code>x</code>, when the left-hand side is a <code>str</code> (<code>'yes' == x</code>): <ul> <li> <code>'yes'.__eq__(x)</code> returns <code>NotImplemented</code> and</li> <li>therefore resolves to <code>x.__eq__('yes')</code> – resulting in numpy's element-wise comparison.</li> </ul> However, when the left-hand side is a <code>unicode</code> (<code>u'yes' == x</code>): <ul> <li> <code>u'yes'.__eq__(x)</code> simply returns <code>False</code>.</li> </ul> The reason for the different <code>__eq__</code> behaviours is that <code>str.__eq__()</code> simply returns <code>NotImplemented</code> if its argument is not a <code>str</code> type, whereas <code>unicode.__eq__()</code> first tries to convert its argument to a <code>unicode</code>, and only returns <code>NotImplemented</code> if that conversion fails. In this case, the numpy array is convertible to a <code>unicode</code>: <code>u'yes' == x</code> is essentially <code>u'yes' == unicode(x)</code>.

Unicode elementwise string comparison in numpy

I had a question about equality comparison with numpy and arrays of strings. Say I define the following array:

x = np.array(['yes', 'no', 'maybe'])

Then I can test for equality with other strings and it does element wise comparison with the single string (following, I think, the broadcasting rules here: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html ?):

'yes' == x
#op : array([ True, False, False], dtype=bool)

x == 'yes'
#op : array([ True, False, False], dtype=bool)

However, if I compare with unicode strings I get different behaviour with element wise comparison only happening if I compare the array to the string and only a single comparison being made if I compare the string to the array.

x == u'yes'
#op : array([ True, False, False], dtype=bool)

u'yes' == x
#op : False

I can't find details of this behaviour in the numpy docs and was hoping someone could explain or point me to details of why comparison with unicode strings behaves differently?

How to perform element-wise comparison of two string arrays using Python NumPy?

To perform element-wise comparison of two string arrays using a comparison operator, use the numpy.compare_chararrays () method in Python Numpy. The arr1 and arr2 are the two input string arrays of the same shape to be compared. The 3rd parameter is rstrip, if True, the spaces at the end of Strings are removed before the comparison.

How to concatenate two arrays of NumPy strings?

The numpy.char module provides a set of vectorized string operations for arrays of type numpy.str_ or numpy.bytes_ . All of them are based on the string methods in the Python standard library. Return element-wise string concatenation for two arrays of str or unicode. Return (a * i), that is string multiple concatenation, element-wise.

What is NumPy char in Python?

The numpy.char module provides a set of vectorized string operations for arrays of type numpy.str_ or numpy.bytes_ . All of them are based on the string methods in the Python standard library.

What is the difference between NumPy count and NumPy rfind?

numpy.count () : This function returns the number of occurrences of a substring in the given string. numpy.rfind () : This function returns the highest index of the substring if found in given string. If not found then it returns -1.

The relevant piece of information is this part of the Python's coercion rules:

For objects xand y, first x.__op__(y) is tried. If this is not implemented or returns NotImplemented, y.__rop__(x) is tried.

Using your numpy array x, when the left-hand side is a str ('yes' == x):

'yes'.__eq__(x) returns NotImplemented and
therefore resolves to x.__eq__('yes') – resulting in numpy's element-wise comparison.

However, when the left-hand side is a unicode (u'yes' == x):

u'yes'.__eq__(x) simply returns False.

The reason for the different __eq__ behaviours is that str.__eq__() simply returns NotImplemented if its argument is not a str type, whereas unicode.__eq__() first tries to convert its argument to a unicode, and only returns NotImplemented if that conversion fails. In this case, the numpy array is convertible to a unicode: u'yes' == x is essentially u'yes' == unicode(x).

Unicode elementwise string comparison in numpy

Tags:

python

arrays

unicode

python-2.x

numpy

jay--bee

People also ask

1 Answers

一二三

Recent Activity

Donate For Us

Unicode elementwise string comparison in numpy

Tags:

python

arrays

unicode

python-2.x

numpy

jay--bee

People also ask

1 Answers

一二三

Related questions

Recent Activity

Donate For Us