When slicing a dataframe using loc,
df.loc[start:end]
both start and end are included. Is there an easy way to exclude the end when using loc?
Thanks, so iloc wouldn't include the last one? this works assuming end is indeed part of your index. With datetimes it can backfire.
You can use the following syntax to exclude columns in a pandas DataFrame: #exclude column1 df. loc[:, df.
loc is end-inclusive, we can just say .
Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.
Easiest I can think of is df.loc[start:end].iloc[:-1]
.
Chops off the last one.
loc
includes both the start and end, one less ideal work around is to get the index position and use iloc
to slice the data frame (assume you don't have duplicated index):
df=pd.DataFrame({'A':[1,2,3,4]}, index = ['a','b','c','d'])
df.iloc[df.index.get_loc('a'):df.index.get_loc('c')]
# A
#a 1
#b 2
df.loc['a':'c']
# A
#a 1
#b 2
#c 3
None of the answers addresses the situation where end
is not part of the index.
The more general solution is simply comparing the index to start
and end
, that way you can enforce either of them being inclusive of exclusive.
df[(df.index >= start) & (df.index < end)]
For instance:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(
{
"x": np.arange(48),
"y": np.arange(48) * 2,
},
index=pd.date_range("2020-01-01 00:00:00", freq="1H", periods=48)
)
>>> start = "2020-01-01 14:00"
>>> end = "2020-01-01 19:30" # this is not in the index
>>> df[(df.index >= start) & (df.index < end)]
x y
2020-01-01 14:00:00 14 28
2020-01-01 15:00:00 15 30
2020-01-01 16:00:00 16 32
2020-01-01 17:00:00 17 34
2020-01-01 18:00:00 18 36
2020-01-01 19:00:00 19 38
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With