Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does series[0] and series[0:1] yield different results?

Tags:

python

pandas

So, df['date'] returns:

0        2018-03-01
1        2018-03-01
2        2018-03-01
3        2018-03-01
4        2018-03-01
            ...    
469796   2018-06-20
469797   2018-06-20
469798   2018-06-27
469799   2018-06-27
469800   2018-12-06
Name: date, Length: 469801, dtype: datetime64[ns]

And, df['date'].sort_values() returns:

137241   2018-01-01
378320   2018-01-01
247339   2018-01-01
34333    2018-01-01
387971   2018-01-01
            ...    
109278   2018-12-06
384324   2018-12-06
384325   2018-12-06
109282   2018-12-06
469800   2018-12-06
Name: date, Length: 469801, dtype: datetime64[ns]

Now df['date'].sort_values()[0] "ignores sorting" and returns:

Timestamp('2018-03-01 00:00:00')

Whereas df['date'].sort_values()[0:1] actually returns:

137241   2018-01-01
Name: date, dtype: datetime64[ns]

Why the apparently inconsistent behaviour? As @cs95 accurately pointed out they return a scalar and a Series respectively, which is okay. I am confused about the value, the first one is 2018-03-01 while the second one is 2018-01-01.

Thanks in advance.


Warning

Somehow similar to: why sort_values() is diifferent form sort_values().values

like image 347
gmagno Avatar asked Nov 20 '25 11:11

gmagno


1 Answers

There is a slight difference in how indexing is interpreted for Pandas for scalar indexing vs slicing. Consider a simpler example:

df = pd.DataFrame({'col1': [5, 4, 3, 2, 1]}).sample(frac=1)
df
   col1
4     1
1     4
0     5
3     2
2     3

Also note the result of sort_values:

df['col1'].sort_values()
4    1
3    2
2    3
1    4
0    5

When you call df['col1'].sort_values()[0] you actually get the value indexed by key 0. Here it is implicitly calling loc:

df['col1'].sort_values()[0]     # just gets the value indexed by that key
# 5

df['col1'].sort_values().loc[0]
# 5

And when you slice with indexes, it is assumed they are integral rather than labels, meaning it is implicitly calling iloc:

df['col1'].sort_values()[0:1]   # just gets the first row  
4    1
Name: col1, dtype: int64


df['col1'].sort_values().iloc[0:1]
4    1
Name: col1, dtype: int64

If you want the scalar index operation to return the same thing as the slice, use iloc or iat (singular value):

df['col1'].sort_values().iloc[0]
# 1

df['col1'].sort_values().iat[0]
# 1
like image 131
cs95 Avatar answered Nov 22 '25 01:11

cs95



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!