Let's say I have the following:
In [1]: import pandas as pd import numpy as np df = pd.DataFrame(data=np.random.rand(11),index=pd.date_range('2015-04-20','2015-04-30'),columns=['A']) Out[1]: A 2015-04-20 0.694983 2015-04-21 0.393851 2015-04-22 0.690138 2015-04-23 0.674222 2015-04-24 0.763175 2015-04-25 0.761917 2015-04-26 0.999274 2015-04-27 0.907871 2015-04-28 0.464818 2015-04-29 0.005733 2015-04-30 0.806351
I have some complicated method that identifies a single index as being interesting, for example '2015-04-25'. I can retrieve the row with that index using:
In [2]: df.loc['2015-04-25'] Out[2]: A 0.761917 Name: 2015-04-25 00:00:00, dtype: float64
What would be the nicest way to obtain a number of n rows before and/or after that index value?
What I would like to do is something like:
In[3]: df.getRowsBeforeLoc('2015-04-25',3) Out[3]: 2015-04-22 0.690138 2015-04-23 0.674222 2015-04-24 0.763175 2015-04-25 0.761917
Or equivalently:
In[3]: df.getRowsAfterLoc('2015-04-25',3) Out[3]: 2015-04-25 0.761917 2015-04-26 0.999274 2015-04-27 0.907871 2015-04-28 0.464818
(I don't have a strong opinion on whether or not the row that corresponds to the target index value itself is included.)
You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.
DataFrame. head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start). By default n = 5, it return first 5 rows if value of n is not passed to the method.
loc and iloc are interchangeable when labels are 0-based integers.
loc
supports slicing the beg/end point is included in the range:
In [363]: df.loc[:'2015-04-25'] Out[363]: A 2015-04-25 0.141787 2015-04-26 0.598237 2015-04-27 0.106461 2015-04-28 0.297159 2015-04-29 0.058392 2015-04-30 0.621325 In [364]: df.loc['2015-04-25':] Out[364]: A 2015-04-25 0.141787 2015-04-26 0.598237 2015-04-27 0.106461 2015-04-28 0.297159 2015-04-29 0.058392 2015-04-30 0.621325
To get either first/last (n) rows use head
/tail
:
In [378]: df.loc[:'2015-04-25'].head(3) Out[378]: A 2015-04-20 0.827699 2015-04-21 0.901140 2015-04-22 0.427304 In [377]: df.loc[:'2015-04-25'].tail(3) Out[377]: A 2015-04-23 0.002189 2015-04-24 0.041965 2015-04-25 0.141787
update
To get the row before/after a specifc value we can use get_loc
on the index to return an integer position and then use this with iloc
to get the previous/next row:
In [388]: df.index.get_loc('2015-04-25') Out[388]: 5 In [391]: df.iloc[df.index.get_loc('2015-04-25')-1] Out[391]: A 0.041965 Name: 2015-04-24 00:00:00, dtype: float64 In [392]: df.iloc[df.index.get_loc('2015-04-25')+1] Out[392]: A 0.598237 Name: 2015-04-26 00:00:00, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With