I am using pandas 0.18.1 on a large dataframe. I am confused by the behaviour of value_counts(). This is my code:
print df.phase.value_counts()
def normalise_phase(x):
print x
return int(str(x).split('/')[0])
df['phase_normalised'] = df['phase'].apply(normalise_phase)
This prints the following:
2 35092
3 26248
1 24646
4 22189
1/2 8295
2/3 4219
0 1829
dtype: int64
1
nan
Two questions:
nan printing as an output of normalise_phase, when nan
is not listed as a value in value_counts? value_counts show dtype as int64 if it has string values like
1/2 and nan in it too?You need to pass dropna=False for NaNs to be tallied (see the docs).
int64 is the dtype of the series (counts of the values). The values themselves are the index. dtype of the index will be object, if you check.
ser = pd.Series([1, '1/2', '1/2', 3, np.nan, 5])
ser.value_counts(dropna=False)
Out:
1/2 2
5 1
3 1
1 1
NaN 1
dtype: int64
ser.value_counts(dropna=False).index
Out: Index(['1/2', 5, 3, 1, nan], dtype='object')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With