I'm trying to find the index of the last True value in a pandas boolean Series. My current code looks something like the below. Is there a faster or cleaner way of doing this?
import numpy as np
import pandas as pd
import string
index = np.random.choice(list(string.ascii_lowercase), size=1000)
df = pd.DataFrame(np.random.randn(1000, 2), index=index)
s = pd.Series(np.random.choice([True, False], size=1000), index=index)
last_true_idx_s = s.index[s][-1]
last_true_idx_df = df[s].iloc[-1].name
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.
iloc attribute enables purely integer-location based indexing for selection by position over the given Series object. Example #1: Use Series. iloc attribute to perform indexing over the given Series object.
You can use idxmax
what is the same as argmax of Andy Hayden answer:
print s[::-1].idxmax()
Comparing:
These timings are going to be very dependent on the size of s as well as the number (and position) of Trues - thanks.
In [2]: %timeit s.index[s][-1]
The slowest run took 6.92 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 35 µs per loop
In [3]: %timeit s[::-1].argmax()
The slowest run took 6.67 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 126 µs per loop
In [4]: %timeit s[::-1].idxmax()
The slowest run took 6.55 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 127 µs per loop
In [5]: %timeit s[s==True].last_valid_index()
The slowest run took 8.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 261 µs per loop
In [6]: %timeit (s[s==True].index.tolist()[-1])
The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 239 µs per loop
In [7]: %timeit (s[s==True].index[-1])
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 227 µs per loop
EDIT:
Next solution:
print s[s==True].index[-1]
EDIT1: Solution
(s[s==True].index.tolist()[-1])
was in deleted answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With