Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas get mapping of categories to integer value

Tags:

python

pandas

I can transform categorical columns to their categorical code but how do i get an accurate picture of their mapping? Example:

df_labels = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab')})
df_labels['col2'] = df_labels['col2'].astype('category')  

df_labels looks like this:

   col1 col2
0     1    a
1     2    b
2     3    c
3     4    a
4     5    b

How do i get an accurate mapping of the cat codes to cat categories? The stackoverflow response below says to enumerate the categories. However, I'm not sure if enumerating was the way cat.codes generated the integer values. Is there a more accurate way?

Get mapping of categorical variables in pandas

>>> dict( enumerate(df.five.cat.categories) )

{0: 'bad', 1: 'good'}

What is a good way to get the mapping in the above format but accurate?

like image 648
jxn Avatar asked Feb 13 '17 23:02

jxn


2 Answers

I use:

dict([(category, code) for code, category in enumerate(df_labels.col2.cat.categories)])

# {'a': 0, 'b': 1, 'c': 2}
like image 122
pomber Avatar answered Sep 28 '22 07:09

pomber


Edited answer (removed cat.categories and changed list to dict):

>>> dict(zip(df_labels.col2.cat.codes, df_labels.col2))

{0: 'a', 1: 'b', 2: 'c'}

The original answer which some of the comments are referring to:

>>> list(zip(df_labels.col2.cat.codes, df_labels.col2.cat.categories))

[(0, 'a'), (1, 'b'), (2, 'c')]

As the comments note, the original answer works in this example because the first three values happend to be [a,b,c], but would fail if they were instead [c,b,a] or [b,c,a].

like image 44
Zeugma Avatar answered Sep 28 '22 06:09

Zeugma