Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does dtype=object mean while creating a numpy array?

I was experimenting with numpy arrays and created a numpy array of strings:

ar1 = np.array(['avinash', 'jay']) 

As I have read from from their official guide, operations on numpy array are propagated to individual elements. So I did this:

ar1 * 2 

But then I get this error:

TypeError                                 Traceback (most recent call last) <ipython-input-22-aaac6331c572> in <module>() ----> 1 ar1 * 2  TypeError: unsupported operand type(s) for *: 'numpy.ndarray' and 'int' 

But when I used dtype=object

ar1 = np.array(['avinash', 'jay'], dtype=object) 

while creating the array I am able to do all operations.

Can anyone tell me why this is happening?

like image 710
Avinash Pandey Avatar asked Apr 26 '15 12:04

Avinash Pandey


1 Answers

NumPy arrays are stored as contiguous blocks of memory. They usually have a single datatype (e.g. integers, floats or fixed-length strings) and then the bits in memory are interpreted as values with that datatype.

Creating an array with dtype=object is different. The memory taken by the array now is filled with pointers to Python objects which are being stored elsewhere in memory (much like a Python list is really just a list of pointers to objects, not the objects themselves).

Arithmetic operators such as * don't work with arrays such as ar1 which have a string_ datatype (there are special functions instead - see below). NumPy is just treating the bits in memory as characters and the * operator doesn't make sense here. However, the line

np.array(['avinash','jay'], dtype=object) * 2 

works because now the array is an array of (pointers to) Python strings. The * operator is well defined for these Python string objects. New Python strings are created in memory and a new object array with references to the new strings is returned.


If you have an array with string_ or unicode_ dtype and want to repeat each string, you can use np.char.multiply:

In [52]: np.char.multiply(ar1, 2) Out[52]: array(['avinashavinash', 'jayjay'],        dtype='<U14') 

NumPy has many other vectorised string methods too.

like image 95
Alex Riley Avatar answered Sep 23 '22 02:09

Alex Riley