I understand I can do something like this:
df[df['data'] > 3].index.tolist()
and take the first element of the list
but the place I need to use it is in a loop with a lot of iterations and a very large dataframe. I want to get the first instance and stop the execution right there instead of wasting time to collect all instances to then discard all results but the first one.
Is there a way to do this with Pandas? manually iterating through the rows is crazy slow; splitting the dataframe into chunks and doing a search in each doesn't help that much (possibly because it does some copies, not sure).
edit: here's an example
data = {'data': [10, 11, 12, 14, 15, 16, 18]} # this is over 1M entries in practice
df = pd.DataFrame.from_dict(data)
df.index[df['data']>14].tolist()[0]
this returns 4, as expected.
what I want is to find a fast way to stop execution the moment there is one row matching the condition.
idxmaxStill evaluates a boolean series prior to evaluating idxmax
df['data'].gt(3).idxmax()
argmaxdf.index[(df['data'].to_numpy() > 3).argmax()]
def find(s):
for i, v in s.iteritems():
if v > 3:
return i
find(df['data'])
from numba import njit
@njit
def find(a, b, c):
for x, y in zip(a, b):
if y > c:
return x
find(df.index.to_numpy(), df['data'].to_numpy(), 3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With