Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding the index of the first row matching a condition in pandas

Tags:

python

pandas

I understand I can do something like this:

df[df['data'] > 3].index.tolist()

and take the first element of the list

but the place I need to use it is in a loop with a lot of iterations and a very large dataframe. I want to get the first instance and stop the execution right there instead of wasting time to collect all instances to then discard all results but the first one.

Is there a way to do this with Pandas? manually iterating through the rows is crazy slow; splitting the dataframe into chunks and doing a search in each doesn't help that much (possibly because it does some copies, not sure).

edit: here's an example

data = {'data': [10, 11, 12, 14, 15, 16, 18]}   # this is over 1M entries in practice
df = pd.DataFrame.from_dict(data)
df.index[df['data']>14].tolist()[0]

this returns 4, as expected.

what I want is to find a fast way to stop execution the moment there is one row matching the condition.

like image 459
Thomas Avatar asked Oct 30 '25 00:10

Thomas


1 Answers

idxmax

Still evaluates a boolean series prior to evaluating idxmax

df['data'].gt(3).idxmax()

argmax

df.index[(df['data'].to_numpy() > 3).argmax()]

explicit function

def find(s):
    for i, v in s.iteritems():
        if v > 3:
            return i

find(df['data'])

Numba

from numba import njit

@njit
def find(a, b, c):
    for x, y in zip(a, b):
        if y > c:
            return x

find(df.index.to_numpy(), df['data'].to_numpy(), 3)
like image 120
piRSquared Avatar answered Oct 31 '25 14:10

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!