Why is .loc slicing in pandas inclusive of stop, contrary to typical python slicing?

Question

I am slicing a pandas dataframe and I seem to be getting unexpected slices using .loc, at least as compared to numpy and ordinary python slicing. See the example below.

>>> import pandas as pd
>>> a = pd.DataFrame([[0,1,2],[3,4,5],[4,5,6],[9,10,11],[34,2,1]])
>>> a
    0   1   2
0   0   1   2
1   3   4   5
2   4   5   6
3   9  10  11
4  34   2   1
>>> a.loc[1:3, :]
   0   1   2
1  3   4   5
2  4   5   6
3  9  10  11
>>> a.values[1:3, :]
array([[3, 4, 5],
       [4, 5, 6]])

Interestingly, this only happens with .loc, not .iloc.

>>> a.iloc[1:3, :]
   0  1  2
1  3  4  5
2  4  5  6

Thus, .loc appears to be inclusive of the terminating index, but numpy and .iloc are not.

By the comments, it seems this is not a bug and we are well warned. But why is it the case?

ALollz · Accepted Answer

Remember .loc is primarily label based indexing. The decision to include the stop endpoint becomes far more obvious when working with a non-RangeIndex:

df = pd.DataFrame([1,2,3,4], index=list('achz'))
#   0
#a  1
#c  2
#h  3
#z  4

If I want to select all rows between 'a' and 'h' (inclusive) I only know about 'a' and 'h'. In order to be consistent with other python slicing, you'd need to also know what index follows 'h', which in this case is 'z' but could have been anything.

There's also a section of the documents hidden away that explains this design choice Endpoints are Inclusive

JE_Muc · Answer

Additionally to the point in the docs, pandas slice indexing using .loc is not cell index based. It is in fact value based indexing (in the pandas docs it is called "label based", but for numerical data I prefer the term "value based"), whereas with .iloc it is traditional numpy-style cell indexing.

Furthermore, value based indexing is right-inclusive, whereas cell indexing is not. Just try the following:

a = pd.DataFrame([[0,1,2],[3,4,5],[4,5,6],[9,10,11],[34,2,1]])
a.index = [0, 1, 2, 3.1, 4]  # add a float index

# value based slicing: the following will output all value up to the slice value
a.loc[1:3.1]
# Out:
# 0    1   2
# 1.0  3   4   5
# 2.0  4   5   6
# 3.1  9  10  11

# index based slicing: will raise an error, since only integers are allowed
a.iloc[1:3.1]
# Out: TypeError: cannot do slice indexing on <class 'pandas.core.indexes.numeric.Float64Index'> with these indexers [3.2] of <class 'float'>

To give an explicit answer to your question why it is right-inclusive:
When using values/labels as indices, it is, at least in my opinion, intuitive, that the last index is included. This is as far as I know a design decision of how the implemented function is meant to work.

Why is .loc slicing in pandas inclusive of stop, contrary to typical python slicing?

Tags:

python

pandas

jtorca

2 Answers

ALollz

JE_Muc

Recent Activity

Donate For Us

Why is .loc slicing in pandas inclusive of stop, contrary to typical python slicing?

Tags:

python

pandas

jtorca

2 Answers

ALollz

JE_Muc

Related questions

Recent Activity

Donate For Us