I have an indexed pandas dataframe. By searching through its index, I find a row of interest. How do I find out the iloc of this row?
Example:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
A B C D
2000-01-01 -0.077564 0.310565 1.112333 1.023472
2000-01-02 -0.377221 -0.303613 -1.593735 1.354357
2000-01-03 1.023574 -0.139773 0.736999 1.417595
2000-01-04 -0.191934 0.319612 0.606402 0.392500
2000-01-05 -0.281087 -0.273864 0.154266 0.374022
2000-01-06 -1.953963 1.429507 1.730493 0.109981
2000-01-07 0.894756 -0.315175 -0.028260 -1.232693
2000-01-08 -0.032872 -0.237807 0.705088 0.978011
window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row
Timestamp('2000-01-08 00:00:00', offset='D')
#which is the iloc of window_stop_row?
DataFrame. iloc[] is an index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more. One of the main advantages of DataFrame is its ease of use.
iloc. Purely integer-location based indexing for selection by position. .iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. Allowed inputs are: An integer, e.g. 5 .
You want the .name
attribute and pass this to get_loc
:
In [131]:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
Out[131]:
A B C D
2000-01-01 0.095234 -1.000863 0.899732 -1.742152
2000-01-02 -0.517544 -1.274137 1.734024 -1.369487
2000-01-03 0.134112 1.964386 -0.120282 0.573676
2000-01-04 -0.737499 -0.581444 0.528500 -0.737697
2000-01-05 -1.777800 0.795093 0.120681 0.524045
2000-01-06 -0.048432 -0.751365 -0.760417 -0.181658
2000-01-07 -0.570800 0.248608 -1.428998 -0.662014
2000-01-08 -0.147326 0.717392 3.138620 1.208639
In [133]:
window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row.name
Out[133]:
Timestamp('2000-01-03 00:00:00', offset='D')
In [134]:
df.index.get_loc(window_stop_row.name)
Out[134]:
2
get_loc
returns the ordinal position of the label in your index which is what you want:
In [135]:
df.iloc[df.index.get_loc(window_stop_row.name)]
Out[135]:
A 0.134112
B 1.964386
C -0.120282
D 0.573676
Name: 2000-01-03 00:00:00, dtype: float64
if you just want to search the index then so long as it is sorted then you can use searchsorted
:
In [142]:
df.index.searchsorted('2000-01-04') - 1
Out[142]:
2
While pandas.Index.get_loc()
will only work if you have a single key, the following paradigm will also work getting the iloc
of multiple elements:
np.argwhere(condition).flatten() # array of all iloc where condition is True
In your case, picking the latest element where df.index < '2000-01-04'
:
np.argwhere(df.index < '2000-01-04').flatten()[-1] # returns 2
IIUC you could call index for your case:
In [53]: df[df.index < '2000-01-04'].index[-1]
Out[53]: Timestamp('2000-01-03 00:00:00', offset='D')
EDIT
I think @EdChums answer is what you want. Alternatively you could filter your dataframe with values which you get, then use all
to find the row with that values and then pass it to the index
:
In [67]: df == window_stop_row
Out[67]:
A B C D
2000-01-01 False False False False
2000-01-02 False False False False
2000-01-03 True True True True
2000-01-04 False False False False
2000-01-05 False False False False
2000-01-06 False False False False
2000-01-07 False False False False
2000-01-08 False False False False
In [68]: (df == window_stop_row).all(axis=1)
Out[68]:
2000-01-01 False
2000-01-02 False
2000-01-03 True
2000-01-04 False
2000-01-05 False
2000-01-06 False
2000-01-07 False
2000-01-08 False
Freq: D, dtype: bool
In [69]: df.index[(df == window_stop_row).all(axis=1)]
Out[69]: DatetimeIndex(['2000-01-03'], dtype='datetime64[ns]', freq='D')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With