When I create a one dimensional array in numpy and use a string (containing digits) to index it, I get an error as expected:
>>> import numpy as np
>>> a = np.arange(15)
>>> a['10']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: field named 10 not found.
However, when I create a two dimensional array and use two strings for indexing, it gives no error and returns the element as if the strings are converted to integers first
>>> b = np.arange(15).reshape(3,5)
>>> b
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> b[1, 2]
7
>>> b['1', '2']
7
What's going on? Why don't I get an error in the two dimensional case?
disclaimer -- this answer is bound to be incomplete
I think what you're seeing is a consequence of fancy sequence indexing. Since strings are actually sequences, you're getting the values of the string one character at a time and converting them to "intp
" objects (which presumably just uses python's int
function)-- which is then giving you your array index.
This also explains the 1D case:
class Foo(object):
def __getitem__(self,idx):
print idx
a = Foo()
a[12]
a[12,12]
Note that in the second case a tuple
is passed whereas in the first case an integer is passed.
The piece of this that I still don't understand is demonstrated by this test:
import numpy as np
a = np.arange(156).reshape(13,12)
print a[12,3] == a['12',3] #True -- I would have thought False for this one...
print a['12',3] == a[('1','2'),3] #False -- I would have guessed True for this..
assert( a[tuple('12'),3] == a[(1,2),3] ) #This passes, as expected
Feel free to try to explain this one to me in comments. :) The discrepancy might be that numpy deliberately leaves strings alone when converting to a sequence of intp
objects in order to more smoothly handle record arrays...
Just to add, note that the first case (a single string), is probably to do with support for recarrays, which use strings as field names.
Please do not rely on the second case. Numpy is extremely free about indexing with non-arrays, since if it is a non-array (and not a slice and not None), it will simply try to convert it into an integer array, which is well defined for these strings. However this is not by design, its because too much software relies on this behaviour (at least partially) to actually change it, and quite honestly, while this make somewhat make sense for floats which are forgotten to be cast, it really doesn't for strings.
Some more details for @mgilson. considering that all of this is off label usage, it really cooks down to implementation details. For example a single string is currently special cased for recarrays even if its not a recarray, but a tuple of strings is only special cased for recarrays.
Now a list of strings, is somewhat special cased, since they are not tuples, but act like one most of the time. This may be a small bug... Because it finds a sequence inside of it, it triggers fancy indexing, but "forgets" to convert it to an array. Though I would generally use tuples to denote multiple axes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With