Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas slicing excluding the end

When slicing a dataframe using loc,

df.loc[start:end]

both start and end are included. Is there an easy way to exclude the end when using loc?

like image 831
zcadqe Avatar asked Aug 05 '17 15:08

zcadqe


People also ask

Does ILOC include last index?

Thanks, so iloc wouldn't include the last one? this works assuming end is indeed part of your index. With datetimes it can backfire.

How do I ignore the last column in Pandas?

You can use the following syntax to exclude columns in a pandas DataFrame: #exclude column1 df. loc[:, df.

Does LOC include end?

loc is end-inclusive, we can just say .

How do you slice values in Pandas?

Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.


3 Answers

Easiest I can think of is df.loc[start:end].iloc[:-1].

Chops off the last one.

like image 96
WillZ Avatar answered Oct 23 '22 09:10

WillZ


loc includes both the start and end, one less ideal work around is to get the index position and use iloc to slice the data frame (assume you don't have duplicated index):

df=pd.DataFrame({'A':[1,2,3,4]}, index = ['a','b','c','d'])

df.iloc[df.index.get_loc('a'):df.index.get_loc('c')]

#   A
#a  1
#b  2

df.loc['a':'c']

#   A
#a  1
#b  2
#c  3
like image 17
Psidom Avatar answered Oct 23 '22 10:10

Psidom


None of the answers addresses the situation where end is not part of the index. The more general solution is simply comparing the index to start and end, that way you can enforce either of them being inclusive of exclusive.

df[(df.index >= start) & (df.index < end)]

For instance:

>>> import pandas as pd
>>> import numpy as np

>>> df = pd.DataFrame(
    {
        "x": np.arange(48),
        "y": np.arange(48) * 2,
    },
    index=pd.date_range("2020-01-01 00:00:00", freq="1H", periods=48)
)

>>> start = "2020-01-01 14:00"
>>> end = "2020-01-01 19:30" # this is not in the index

>>> df[(df.index >= start) & (df.index < end)]

                    x   y
2020-01-01 14:00:00 14  28
2020-01-01 15:00:00 15  30
2020-01-01 16:00:00 16  32
2020-01-01 17:00:00 17  34
2020-01-01 18:00:00 18  36
2020-01-01 19:00:00 19  38
like image 6
Giorgio Balestrieri Avatar answered Oct 23 '22 08:10

Giorgio Balestrieri