Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to know the labels assigned by astype('category').cat.codes?

Tags:

I have the following dataframe called language

         lang          level 0      english         intermediate 1      spanish         intermediate 2      spanish         basic 3      english         basic 4      english         advanced 5      spanish         intermediate 6      spanish         basic 7      spanish         advanced 

I categorized each of my variables into numbers by using

language.lang.astype('category').cat.codes

and

language.level.astype('category').cat.codes

respectively. Obtaining the following data frame:

      lang   level 0      0       1 1      1       1 2      1       0 3      0       0 4      0       2 5      1       1 6      1       0 7      1       2 

Now, I would like to know if there is a way to obtain which original value corresponds to each value. I'd like to know that the 0 value in the lang column corresponds to english and so on.

Is there any function that allows me to get back this information?

like image 794
Marisa Avatar asked Jun 29 '18 12:06

Marisa


People also ask

What is cat code in Python?

The categorical type is a process of factorization. Meaning that each unique value or category is given a incremented integer value starting from zero. There is no need to construct a dictionary to look it up when you are already given a construct to look it up quite efficiently.

What does cat code do?

A Catcode is a hand-drawn code that can be associated with any digital content. Use Catcodes to enrich your paper notes with text, links, photos, videos, etc. It works like a QR code. Simply scan a Catcode and see what's attached.

What is pandas cat code?

Categoricals are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited, and usually fixed, number of possible values ( categories ; levels in R). Examples are gender, social class, blood type, country affiliation, observation time or rating via Likert scales.


1 Answers

You can generate dictionary:

c = language.lang.astype('category')  d = dict(enumerate(c.cat.categories)) print (d) {0: 'english', 1: 'spanish'} 

So then if necessary is possible map:

language['code'] = language.lang.astype('category').cat.codes  language['level_back'] = language['code'].map(d) print (language)       lang         level  code level_back 0  english  intermediate     0    english 1  spanish  intermediate     1    spanish 2  spanish         basic     1    spanish 3  english         basic     0    english 4  english      advanced     0    english 5  spanish  intermediate     1    spanish 6  spanish         basic     1    spanish 7  spanish      advanced     1    spanish 
like image 107
jezrael Avatar answered Sep 19 '22 12:09

jezrael