Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get number of rows before and after a certain index value in pandas

Tags:

Let's say I have the following:

In [1]: import pandas as pd         import numpy as np         df = pd.DataFrame(data=np.random.rand(11),index=pd.date_range('2015-04-20','2015-04-30'),columns=['A']) Out[1]:                 A 2015-04-20  0.694983 2015-04-21  0.393851 2015-04-22  0.690138 2015-04-23  0.674222 2015-04-24  0.763175 2015-04-25  0.761917 2015-04-26  0.999274 2015-04-27  0.907871 2015-04-28  0.464818 2015-04-29  0.005733 2015-04-30  0.806351 

I have some complicated method that identifies a single index as being interesting, for example '2015-04-25'. I can retrieve the row with that index using:

In [2]: df.loc['2015-04-25'] Out[2]:  A    0.761917 Name: 2015-04-25 00:00:00, dtype: float64 

What would be the nicest way to obtain a number of n rows before and/or after that index value?

What I would like to do is something like:

In[3]: df.getRowsBeforeLoc('2015-04-25',3) Out[3]: 2015-04-22  0.690138 2015-04-23  0.674222 2015-04-24  0.763175 2015-04-25  0.761917 

Or equivalently:

In[3]: df.getRowsAfterLoc('2015-04-25',3) Out[3]: 2015-04-25  0.761917 2015-04-26  0.999274 2015-04-27  0.907871 2015-04-28  0.464818 

(I don't have a strong opinion on whether or not the row that corresponds to the target index value itself is included.)

like image 361
jkokorian Avatar asked Apr 23 '15 09:04

jkokorian


People also ask

How do you extract the first 10 rows in pandas?

You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.

How do you get the first 5 rows in pandas?

DataFrame. head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start). By default n = 5, it return first 5 rows if value of n is not passed to the method.

Can I use ILOC and loc together?

loc and iloc are interchangeable when labels are 0-based integers.


1 Answers

loc supports slicing the beg/end point is included in the range:

In [363]:  df.loc[:'2015-04-25'] Out[363]:                    A 2015-04-25  0.141787 2015-04-26  0.598237 2015-04-27  0.106461 2015-04-28  0.297159 2015-04-29  0.058392 2015-04-30  0.621325 In [364]:  df.loc['2015-04-25':] Out[364]:                    A 2015-04-25  0.141787 2015-04-26  0.598237 2015-04-27  0.106461 2015-04-28  0.297159 2015-04-29  0.058392 2015-04-30  0.621325 

To get either first/last (n) rows use head/tail:

In [378]:  df.loc[:'2015-04-25'].head(3) Out[378]:                    A 2015-04-20  0.827699 2015-04-21  0.901140 2015-04-22  0.427304  In [377]:  df.loc[:'2015-04-25'].tail(3) Out[377]:                    A 2015-04-23  0.002189 2015-04-24  0.041965 2015-04-25  0.141787 

update

To get the row before/after a specifc value we can use get_loc on the index to return an integer position and then use this with iloc to get the previous/next row:

In [388]:  df.index.get_loc('2015-04-25') Out[388]: 5 In [391]:  df.iloc[df.index.get_loc('2015-04-25')-1] Out[391]: A    0.041965 Name: 2015-04-24 00:00:00, dtype: float64 In [392]:  df.iloc[df.index.get_loc('2015-04-25')+1] Out[392]: A    0.598237 Name: 2015-04-26 00:00:00, dtype: float64 
like image 135
EdChum Avatar answered Sep 28 '22 09:09

EdChum