Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas date index loc between dates throws KeyError when edge date is not in dataframe

I can't understand why I'm getting KeyError: Timestamp('...') when using loc on date index.

With given df: dtypes are datetime64[ns], int, int, DATE1 is index

            DATE1    VALUE2  VALUE3
2021-08-20 00:00:00      11     424
2021-08-21 00:00:00      22     424
2021-08-22 00:00:00      33     424
2021-08-23 00:00:00      44     242

I'm trying to use loc on index like this:

start_date = date(2021-08-20)
end_date = date(2021-08-23)
df = df.loc[start_date:end_date]

and this is working fine. I'm getting 4 records. However when I do this:

start_date = date(2021-08-20)
end_date = date(2021-08-24) #end_date is higher than values in dataframe
df = df.loc[start_date:end_date]

I'm getting KeyError: KeyError: Timestamp('2021-08-24 00:00:00'). Could someone point me how to resolve this?

like image 481
Kalik Avatar asked Oct 26 '25 12:10

Kalik


1 Answers

In order to use label-based slices with bounds outside of index range, the index must be monotonically increasing or decreasing.

From pandas docs:

If the index of a Series or DataFrame is monotonically increasing or decreasing, then the bounds of a label-based slice can be outside the range of the index, much like slice indexing a normal Python list. Monotonicity of an index can be tested with the is_monotonic_increasing() and is_monotonic_decreasing() attributes.

On the other hand, if the index is not monotonic, then both slice bounds must be unique members of the index.

You can use df.sort_index to sort the index and then out of bounds slices should work.

like image 63
nikniknik Avatar answered Oct 29 '25 09:10

nikniknik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!