Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does value_counts not show all values present?

Tags:

python

pandas

I am using pandas 0.18.1 on a large dataframe. I am confused by the behaviour of value_counts(). This is my code:

print df.phase.value_counts()
def normalise_phase(x):
    print x
    return int(str(x).split('/')[0])
df['phase_normalised'] = df['phase'].apply(normalise_phase)

This prints the following:

2      35092
3      26248
1      24646
4      22189
1/2     8295
2/3     4219
0       1829
dtype: int64
1
nan

Two questions:

  • Why is nan printing as an output of normalise_phase, when nan is not listed as a value in value_counts?
  • Why does value_counts show dtype as int64 if it has string values like 1/2 and nan in it too?
like image 946
Richard Avatar asked Sep 07 '25 18:09

Richard


1 Answers

You need to pass dropna=False for NaNs to be tallied (see the docs). int64 is the dtype of the series (counts of the values). The values themselves are the index. dtype of the index will be object, if you check.

ser = pd.Series([1, '1/2', '1/2', 3, np.nan, 5])

ser.value_counts(dropna=False)
Out: 
1/2    2
5      1
3      1
1      1
NaN    1
dtype: int64

ser.value_counts(dropna=False).index
Out: Index(['1/2', 5, 3, 1, nan], dtype='object')

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!