Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do i find the iloc of a row in pandas dataframe?

I have an indexed pandas dataframe. By searching through its index, I find a row of interest. How do I find out the iloc of this row?

Example:

dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
                   A         B         C         D
2000-01-01 -0.077564  0.310565  1.112333  1.023472
2000-01-02 -0.377221 -0.303613 -1.593735  1.354357
2000-01-03  1.023574 -0.139773  0.736999  1.417595
2000-01-04 -0.191934  0.319612  0.606402  0.392500
2000-01-05 -0.281087 -0.273864  0.154266  0.374022
2000-01-06 -1.953963  1.429507  1.730493  0.109981
2000-01-07  0.894756 -0.315175 -0.028260 -1.232693
2000-01-08 -0.032872 -0.237807  0.705088  0.978011

window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row
Timestamp('2000-01-08 00:00:00', offset='D')
#which is the iloc of window_stop_row?
like image 498
lmsasu Avatar asked Jan 20 '16 10:01

lmsasu


People also ask

Does ILOC select rows or columns?

DataFrame. iloc[] is an index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more. One of the main advantages of DataFrame is its ease of use.

What is ILOC in pandas DataFrame?

iloc. Purely integer-location based indexing for selection by position. .iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. Allowed inputs are: An integer, e.g. 5 .


3 Answers

You want the .name attribute and pass this to get_loc:

In [131]:
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df

Out[131]:
                   A         B         C         D
2000-01-01  0.095234 -1.000863  0.899732 -1.742152
2000-01-02 -0.517544 -1.274137  1.734024 -1.369487
2000-01-03  0.134112  1.964386 -0.120282  0.573676
2000-01-04 -0.737499 -0.581444  0.528500 -0.737697
2000-01-05 -1.777800  0.795093  0.120681  0.524045
2000-01-06 -0.048432 -0.751365 -0.760417 -0.181658
2000-01-07 -0.570800  0.248608 -1.428998 -0.662014
2000-01-08 -0.147326  0.717392  3.138620  1.208639

In [133]:    
window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row.name

Out[133]:
Timestamp('2000-01-03 00:00:00', offset='D')

In [134]:
df.index.get_loc(window_stop_row.name)

Out[134]:
2

get_loc returns the ordinal position of the label in your index which is what you want:

In [135]:    
df.iloc[df.index.get_loc(window_stop_row.name)]

Out[135]:
A    0.134112
B    1.964386
C   -0.120282
D    0.573676
Name: 2000-01-03 00:00:00, dtype: float64

if you just want to search the index then so long as it is sorted then you can use searchsorted:

In [142]:
df.index.searchsorted('2000-01-04') - 1

Out[142]:
2
like image 67
EdChum Avatar answered Oct 19 '22 19:10

EdChum


While pandas.Index.get_loc() will only work if you have a single key, the following paradigm will also work getting the iloc of multiple elements:

np.argwhere(condition).flatten()   # array of all iloc where condition is True

In your case, picking the latest element where df.index < '2000-01-04':

np.argwhere(df.index < '2000-01-04').flatten()[-1]  # returns 2
like image 40
ascripter Avatar answered Oct 19 '22 19:10

ascripter


IIUC you could call index for your case:

In [53]: df[df.index < '2000-01-04'].index[-1]
Out[53]: Timestamp('2000-01-03 00:00:00', offset='D') 

EDIT

I think @EdChums answer is what you want. Alternatively you could filter your dataframe with values which you get, then use all to find the row with that values and then pass it to the index:

In [67]: df == window_stop_row
Out[67]:
                A      B      C      D
2000-01-01  False  False  False  False
2000-01-02  False  False  False  False
2000-01-03   True   True   True   True
2000-01-04  False  False  False  False
2000-01-05  False  False  False  False
2000-01-06  False  False  False  False
2000-01-07  False  False  False  False
2000-01-08  False  False  False  False

In [68]: (df == window_stop_row).all(axis=1)
Out[68]:
2000-01-01    False
2000-01-02    False
2000-01-03     True
2000-01-04    False
2000-01-05    False
2000-01-06    False
2000-01-07    False
2000-01-08    False
Freq: D, dtype: bool

In [69]: df.index[(df == window_stop_row).all(axis=1)]
Out[69]: DatetimeIndex(['2000-01-03'], dtype='datetime64[ns]', freq='D')
like image 1
Anton Protopopov Avatar answered Oct 19 '22 20:10

Anton Protopopov