Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

2D numpy array does not give an error when indexing with strings containing digits

When I create a one dimensional array in numpy and use a string (containing digits) to index it, I get an error as expected:

>>> import numpy as np
>>> a = np.arange(15)
>>> a['10']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: field named 10 not found.

However, when I create a two dimensional array and use two strings for indexing, it gives no error and returns the element as if the strings are converted to integers first

>>> b = np.arange(15).reshape(3,5)
>>> b
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> b[1, 2]
7
>>> b['1', '2']
7

What's going on? Why don't I get an error in the two dimensional case?

like image 385
titusjan Avatar asked Oct 31 '12 12:10

titusjan


2 Answers

disclaimer -- this answer is bound to be incomplete

I think what you're seeing is a consequence of fancy sequence indexing. Since strings are actually sequences, you're getting the values of the string one character at a time and converting them to "intp" objects (which presumably just uses python's int function)-- which is then giving you your array index.

This also explains the 1D case:

class Foo(object):
    def __getitem__(self,idx):
        print idx

a = Foo()
a[12]
a[12,12]

Note that in the second case a tuple is passed whereas in the first case an integer is passed.


The piece of this that I still don't understand is demonstrated by this test:

import numpy as np
a = np.arange(156).reshape(13,12)
print a[12,3] == a['12',3]   #True -- I would have thought False for this one...
print a['12',3] == a[('1','2'),3]  #False -- I would have guessed True for this..
assert( a[tuple('12'),3] == a[(1,2),3] )  #This passes, as expected

Feel free to try to explain this one to me in comments. :) The discrepancy might be that numpy deliberately leaves strings alone when converting to a sequence of intp objects in order to more smoothly handle record arrays...

like image 106
mgilson Avatar answered Nov 15 '22 18:11

mgilson


Just to add, note that the first case (a single string), is probably to do with support for recarrays, which use strings as field names.

Please do not rely on the second case. Numpy is extremely free about indexing with non-arrays, since if it is a non-array (and not a slice and not None), it will simply try to convert it into an integer array, which is well defined for these strings. However this is not by design, its because too much software relies on this behaviour (at least partially) to actually change it, and quite honestly, while this make somewhat make sense for floats which are forgotten to be cast, it really doesn't for strings.


Some more details for @mgilson. considering that all of this is off label usage, it really cooks down to implementation details. For example a single string is currently special cased for recarrays even if its not a recarray, but a tuple of strings is only special cased for recarrays.

Now a list of strings, is somewhat special cased, since they are not tuples, but act like one most of the time. This may be a small bug... Because it finds a sequence inside of it, it triggers fancy indexing, but "forgets" to convert it to an array. Though I would generally use tuples to denote multiple axes.

like image 40
seberg Avatar answered Nov 15 '22 19:11

seberg