I am doing a value_counts()
over a column of integers that represent categorical values.
I have a dict that maps the numbers to strings that correspond to the category name.
I want to find the best way to have the index with the corresponding name. As I am not happy with my 4 lines solution.
df = pd.DataFrame({"weather": [1,2,1,3]})
df
>>>
weather
0 1
1 2
2 1
3 3
weather_correspondance_dict = {1:"sunny", 2:"rainy", 3:"cloudy"}
Now how I solve the problem:
df_vc = df.weather.value_counts()
index = df_vc.index.map(lambda x: weather_correspondance_dict[x] )
df_vc.index = index
df_vc
>>>
sunny 2
cloudy 1
rainy 1
dtype: int64
I am not happy with that solution that is very tedious, do you have a best practice for that situation ?
This is my solution :
>>> weather_correspondance_dict = {1:"sunny", 2:"rainy", 3:"cloudy"}
>>> df["weather"].value_counts().rename(index=weather_correspondance_dict)
sunny 2
cloudy 1
rainy 1
Name: weather, dtype: int64
Here's a simpler solution:
weathers = ['sunny', 'rainy', 'cloudy']
weathers_dict = dict(enumerate(weathers, 1))
df_vc = df['weather'].value_counts()
df_vc.index = df_vc.index.map(weathers_dict.get)
Explanation
dict
with enumerate
to construct a dictionary mapping integers to a list of weather types.dict.get
with pd.Index.map
. Unlike pd.Series.apply
, you cannot pass a dictionary directly, but you can pass a callable function instead.Alternatively, you can apply your map to weather
before using pd.Series.value_counts
. This way, you do not need to update the index of your result.
df['weather'] = df['weather'].map(weathers_dict)
df_vc = df['weather'].value_counts()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With