I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:
df_labels = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab')})
df_labels['col2'] = df_labels['col2'].astype('category')
df_labels looks like this:
col1 col2
0 1 a
1 2 b
2 3 c
3 4 a
4 5 b
How do i get an accurate mapping of the cat codes to cat categories? The stackoverflow response below says to enumerate the categories. However, I'm not sure if enumerating was the way cat.codes generated the integer values. Is there a more accurate way?
Get mapping of categorical variables in pandas
>>> dict( enumerate(df.five.cat.categories) )
{0: 'bad', 1: 'good'}
What is a good way to get the mapping in the above format but accurate?
I use:
dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])
# {'a': 0, 'b': 1, 'c': 2}
Edited answer (removed cat.categories
and changed list
to dict
):
>>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))
{0: 'a', 1: 'b', 2: 'c'}
The original answer which some of the comments are referring to:
>>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))
[(0, 'a'), (1, 'b'), (2, 'c')]
As the comments note, the original answer works in this example because the first three values happend to be [a,b,c]
, but would fail if they were instead [c,b,a]
or [b,c,a]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With