Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing numpy array of dtype object

My question is "why?:"

aa[0]
array([[405, 162, 414, 0,
        array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],
      dtype=object),
        0, 0, 0]], dtype=object)

aaa
array([[405, 162, 414, 0,
        array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],
      dtype=object),
        0, 0, 0]], dtype=object)

np.array_equal(aaa,aa[0])
False

Those arrays are completly identical.

My minimal example doesn't reproduce this:

be=np.array([1],dtype=object)

be
array([1], dtype=object)

ce=np.array([1],dtype=object)

ce
array([1], dtype=object)

np.array_equal(be,ce)
True

Nor does this one:

ce=np.array([np.array([1]),'5'],dtype=object)

be=np.array([np.array([1]),'5'],dtype=object)

np.array_equal(be,ce)
True

However, to reproduce my problem try this:

be=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

ce=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

np.array_equal(be,ce)
False

np.array_equal(be[0],ce[0])
False

And I have no idea why those are not equal. And to add the bonus question, how do I compare them?

I need an efficient way to check if aaa is in the stack aa.

I'm not using aaa in aa because of DeprecationWarning: elementwise == comparison failed; this will raise an error in the future. and because it still returns False if anyone is wondering.


What else have I tried?:

np.equal(be,ce)
*** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

np.all(be,ce)
*** TypeError: only integer scalar arrays can be converted to a scalar index

all(be,ce)
*** TypeError: all() takes exactly one argument (2 given)

all(be==ce)
*** TypeError: 'bool' object is not iterable

np.where(be==ce)
(array([], dtype=int64),)

And these, which I can't get to run in the console, all evaluate to False, some giving the deprecation warning:

import numpy as np

ce=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

be=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

print(np.any([bee in ce for bee in be]))

print(np.any([bee==cee for bee in be for cee in ce]))

print(np.all([bee in ce for bee in be]))

print(np.all([bee==cee for bee in be for cee in ce]))

And of course other questions telling me this should work...

like image 251
DonQuiKong Avatar asked Oct 09 '18 06:10

DonQuiKong


People also ask

How do I compare values in two numpy arrays?

Method 1: We generally use the == operator to compare two NumPy arrays to generate a new array object. Call ndarray. all() with the new array object as ndarray to return True if the two NumPy arrays are equivalent.

How can you identify the datatype of a given numpy array?

The astype() function creates a copy of the array, and allows you to specify the data type as a parameter. The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the data type directly like float for float and int for integer.

Can numpy array contains different data types?

merge_arrays function which can be used to merge numpy arrays in different data type into either structured array or record array.

How do you check if all elements in an array are equal numpy?

Use numpy.allclose() function to check if two arrays are element-wise equal or not in Python. The numpy. allclose() function returns True if all the elements inside both arrays are equal within a specified tolerance.


3 Answers

To make an element-wise comparison between the arrays, you can use numpy.equal() with the keyword argument dtype=numpy.object as in :

In [60]: np.equal(be, ce, dtype=np.object)
Out[60]: 
array([[True, True, True, True,
        array([ True,  True,  True,  True,  True]), True, True, True]],
      dtype=object)

P.S. checked using NumPy version 1.15.2 and Python 3.6.6

edit

From the release notes for 1.15,

https://docs.scipy.org/doc/numpy-1.15.1/release.html#comparison-ufuncs-accept-dtype-object-overriding-the-default-bool

Comparison ufuncs accept dtype=object, overriding the default bool

This allows object arrays of symbolic types, which override == and 
other operators to return expressions, to be compared elementwise with 
np.equal(a, b, dtype=object).
like image 132
kmario23 Avatar answered Oct 24 '22 05:10

kmario23


To complement @kmario23's answer, what about doing

def wrpr(bools):
    try:
      # ints  = bools.flatten().prod()
        fltn_bools = np.hstack(bools)
    except: # should not pass silently.
        fltn_bools = np.array(wrpr(a) for a in bools)        
    ints = fltn_bools.prod()
    if isinstance(ints, np.ndarray):
        return wrpr(ints)
    return bool(ints)

And finally,

>>> wrpr(np.equal(ce, be, dtype=np.object))
True

Checked using (numpy1.15.1 & Python 3.6.5) & (numpy1.15.1 & Python 2.7.13).


But still, as commented here

NumPy is designed for rigid multidimensional grids of numbers. Trying to get anything but a rigid multidimensional grid is going to be painful. (@user2357112, Jul 31 '17 at 23:10)

and/or

Moral of the story: Don't use dtype=object arrays. They are stunted Python lists, with worse performance characteristics, and numpy is not designed to handle the case of sequence-like containers within these object arrays. (@juanpa.arrivillaga, Jul 31 '17 at 23:38)

like image 38
keepAlive Avatar answered Oct 24 '22 06:10

keepAlive


The behavior you are seeing is kind of documented here

Deprecations¶

...

Object array equality comparisons

In the future object array comparisons both == and np.equal will not make use of identity checks anymore. For example:

>

a = np.array([np.array([1, 2, 3]), 1])

b = np.array([np.array([1, 2, 3]), 1])

a == b

will consistently return False (and in the future an error) even if the array in a and b was the same object.

The equality operator == will in the future raise errors like np.equal if broadcasting or element comparisons, etc. fails.

Comparison with arr == None will in the future do an elementwise comparison instead of just returning False. Code should be using arr is None.

All of these changes will give Deprecation- or FutureWarnings at this time.

So far, so clear. Or is it?

We can see from @kmario23's answer that as of version 15.2 these changes are not fully implemented yet.

To make matters worse, consider this:

>>> A = np.array([None, a])
>>> A1 = np.array([None, a])
>>> At = np.array([None, a[:2]])
>>> 
>>> A==A1
False
>>> A==At
array([ True, False])
>>> 

Looks like the current behavior is more a coincidence than the result of careful planning.

I suspect it all comes down to whether an exception is raised during element-wise comparison, cf. here and here.

If two corresponding elements of the containing arrays are arrays themselves and of compatible shapes as in A==A1, their comparison yields an array of bools. Trying to cast this to a scalar bool raises an exception. Currently, exceptions are caught and a scalar False is returned.

In the A==At example an exception is raised when the last two elements are compared because their shapes don't broadcast. This is caught and the comparison for this element returns a scalar False which is why comparison of the containing arrays returns a "normal" array of bools.

What about the workarounds suggested by @kmario23 and @Kanak? Do they work?

Well, yes ...

>>> np.equal(A, A1, dtype=object)
array([True, array([ True,  True,  True])], dtype=object)
>>> wrpr(np.equal(A, A1, dtype=object))
True

... and no.

>>> AA = np.array([None, A])
>>> AA1 = np.array([None, A1])
>>> np.equal(AA, AA1, dtype=object)
array([True, False], dtype=object)
>>> wrpr(np.equal(AA, AA1, dtype=object))
False
like image 1
Paul Panzer Avatar answered Oct 24 '22 04:10

Paul Panzer