I'm using numpy.argmax
to calculate the first index where True
can be found in a vector of bools. Invoking on a pandas.Series
gives me the Series index rather than the element index.
I found a subtle bug in my code that popped up when the vector was all False; returning index 0 in this case seems dangerous since True could very well be the case where True was in the first element. What's the design choice for this return value?
>>> numpy.argmax([False,False,False])
0
>>> numpy.argmax([True, False, True])
0
>>> s = pandas.Series( [ False, False, False ] , index=[3,6,9] )
>>> numpy.argmax(s)
3
>>> s1 = pandas.Series( [ True, False, False ] , index=[3,6,9] )
>>> numpy.argmax(s1)
3
From the source code:
In case of multiple occurrences of the maximum values, the indices
corresponding to the first occurrence are returned.
In the case where the vector is all False, the max value is zero so the index of the first occurrence of the max value i.e. 0 is returned.
So at the end of the day it was a misinterpretation of argmax
(which is a straightforward function), forgetting that False
and True
are values that have an order. I was blindsided to these realities in using argmax
as a tool in service of to finding a specific element (an index to any True
element) and expecting it to behave like a common find
function with the common conventions of returning an empty list []
, -1
for an index, or even None
under the condition the element does not exit.
I wound up coding my ultimate solution as follows
s = pandas.Series( listOfBools )
idx = s.argmax()
if idx == s.index[0] and not s[idx] :
return -1
return idx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With