Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get previous and next index values in DataFrame should they exist

Suppose I have a DataFrame

df = pd.DataFrame(dict(vals=np.random.randint(0, 10, 10)),
                  index=pd.date_range('20170401', '20170410'))

>>> df
               vals
2017-04-01     9
2017-04-02     8
2017-04-03     4
2017-04-04     5
2017-04-05     9
2017-04-06     9
2017-04-07     5
2017-04-08     3
2017-04-09     3
2017-04-10     1

and a particular date which I know is in my index but do not know the position of, for example

cur_dt = df.index[np.random.randint(0, df.index.size)]

>>> cur_dt
Timestamp('2017-04-05 00:00:00', freq='D')

Given cur_dt, I want to determine what the previous and next values in my index are. Should cur_dt be the first (last) value in my index, then the previous (next) element should be cur_dt itself.

To recap, my question is, what is the easiest way to find the previous and next value in my index (or my current value itself if it is an endpoint) given my current value?


My current approach seems rather roundabout, which is my motivation for asking.

cur_iloc = df.index.get_loc(cur_dt)
prev = cur_dt if cur_iloc == 0 else df.index[cur_iloc-1]
next = cur_dt if cur_iloc == df.index.size-1 else df.index[cur_iloc+1]

>>> prev
Timestamp('2017-04-04 00:00:00', freq='D')
>>> next
Timestamp('2017-04-06 00:00:00', freq='D')

If there's no more straightforward way after all then my apologies. I'm imagining being able to just "shift" my index from my current value once forwards and once backwards (with some nice treatment for endpoints), but am not sure if this is possible.

like image 407
Eric Hansen Avatar asked Apr 18 '17 21:04

Eric Hansen


2 Answers

Assuming that the index is sorted, try to use numpy.searchsorted:

Source data sets:

In [185]: df
Out[185]:
            vals
2017-04-01     5
2017-04-02     3
2017-04-03     9
2017-04-04     8
2017-04-05     1
2017-04-06     0
2017-04-07     4
2017-04-08     5
2017-04-09     1
2017-04-10     8

In [186]: cur_dt
Out[186]: Timestamp('2017-04-02 00:00:00', freq='D')

Solution:

In [187]: idx = np.searchsorted(df.index, cur_dt)

In [188]: df.index[max(0, idx-1)]
Out[188]: Timestamp('2017-04-01 00:00:00', freq='D')

In [189]: df.index[min(idx+1, len(df)-1)]
Out[189]: Timestamp('2017-04-03 00:00:00', freq='D')
like image 174
MaxU - stop WAR against UA Avatar answered Oct 03 '22 08:10

MaxU - stop WAR against UA


Reset your index and then use your boolean logic to identify location of your cur_dt like so:

df = df.reset_index()
cur_dt_index = df.index[np.random.randint(0, df['index'].size)]
previous = max(cur_dt_index-1, 0)
next = min(cur_dt_index + 1, df.shape[0])
like image 25
Grr Avatar answered Oct 03 '22 10:10

Grr