Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas value_counts(sort=False) with large series doesn't work

Tags:

pandas

By default Series.values_counts is sorted by the count, in descending order:

In [192]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]).value_counts()
Out[192]: 
0    10
2     7
1     4
3     1
dtype: int64

If I pass sort=False, it appears to try and sort by the value key instead:

In [193]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]).value_counts(sort=False)
Out[193]: 
0    10
1     4
2     7
3     1
dtype: int64

However when I increase the length of the series, the sorting reverts to the original order:

In [194]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]*100).value_counts(sort=False)
Out[194]: 
0    1000
2     700
1     400
3     100
dtype: int64

Any ideas what's going on here?

like image 228
maxymoo Avatar asked Nov 04 '25 12:11

maxymoo


1 Answers

This is correct. You asked .value_counts() not to sort the result, so it doesn't. Below I emulate what sort=True actually does, which is simply a sort_values. If you don't sort, then you will get the result of the counts which is done by a hash table and consequently is in an arbitrary order.

In [39]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]).value_counts(sort=False).sort_values(ascending=False)
Out[39]: 
0    10
2     7
1     4
3     1
dtype: int64

In [40]: pd.Series([3,0,2,0,0,1,0,0,0,1,1,0,1,0,2,2,2,2,2,0,0,2]*100).value_counts(sort=False).sort_values(ascending=False)
Out[40]: 
0    1000
2     700
1     400
3     100
dtype: int64
like image 165
Jeff Avatar answered Nov 06 '25 03:11

Jeff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!