I've recently run into issues when creating Numpy object arrays using e.g.
a = np.array([c], dtype=np.object)
where c is an instance of some complicated class, and in some cases Numpy tries to access some methods of that class. However, doing:
a = np.empty((1,), dtype=np.object)
a[0] = c
solves the issue. I'm curious as to what the difference is between these two internally. Why in the first case might Numpy try and access some attributes or methods of c
?
EDIT: For the record, here is example code that demonstrates the issue:
import numpy as np
class Thing(object):
def __getitem__(self, item):
print "in getitem"
def __len__(self):
return 1
a = np.array([Thing()], dtype='object')
This prints out getitem
twice. Basically if __len__
is present in the class, then this is when one can run into unexpected behavior.
The array object in NumPy is called ndarray . We can create a NumPy ndarray object by using the array() function.
The array module in Python defines an object that is represented in an array. This object contains basic data types such as integers, floating points, and characters. Using the array module, an array can be initialized using the following syntax. Example 1: Printing an array of values with type code, int.
array is just a convenience function to create an ndarray ; it is not a class itself. You can also create an array using numpy. ndarray , but it is not the recommended way. From the docstring of numpy.
A data type object (an instance of numpy. dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted. It describes the following aspects of the data: Type of the data (integer, float, Python object, etc.)
In the first case a = np.array([c], dtype=np.object)
, numpy knows nothing about the shape of the intended array.
For example, when you define
d = range(10)
a = np.array([d])
Then you expect numpy to determine the shape based on the length of d
.
So similarly in your case, numpy will attempt to see if len(c)
is defined, and if it is, to access the elements of c
via c[i]
.
You can see the effect by defining a class such as
class X(object):
def __len__(self): return 10
def __getitem__(self, i): return "x" * i
Then
print numpy.array([X()], dtype=object)
produces
[[ x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx]]
In contrast, in your second case
a = np.empty((1,), dtype=np.object)
a[0] = c
Then the shape of a
has already been determined. Thus numpy can just directly assign the object.
However to an extent this is true only since a
is a vector. If it had been defined with a different shape then method accesses will still occur. The following for example will still call ___getitem__
on a class
a = numpy.empty((1, 10), dtype=object)
a[0] = X()
print a
returns
[[ x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With