pandas equivalent of Stata's encode

Question

I'm looking for a way to replicate the encode behaviour in Stata, which will convert a categorical string column into a number column.

x = pd.DataFrame({'cat':['A','A','B'], 'val':[10,20,30]})
x = x.set_index('cat')

Which results in:

     val
cat     
A     10
A     20
B     30

I'd like to convert the cat column from strings to integers, mapping each unique string to an (arbitrary) integer 1-to-1. It would result in:

Or, just as good:

Any suggestions?

Many thanks as always, Rob

unutbu · Accepted Answer

You could use pd.factorize:

import pandas as pd

x = pd.DataFrame({'cat':('A','A','B'), 'val':(10,20,30)})
labels, levels = pd.factorize(x['cat'])
x['cat'] = labels
x = x.set_index('cat')
print(x)

yields

You could add 1 to labels if you wish to replicate Stata's behaviour:

x['cat'] = labels+1

JohnE · Answer

Stata's encode command starts with a string variable and creates a new integer variable with labels mapped to the original string variable. The direct analog of this in pandas would now be the categorical variable type which became a full-fledged part of pandas starting in 0.15 (which was released after this question was originally asked and answered).

See documentation here.

To demonstrate for this example, the Stata command would be something like:

encode cat, generate(cat2)

whereas the pandas command would be:

x['cat2'] = x['cat'].astype('category')

  cat  val cat2
0   A   10    A
1   A   20    A
2   B   30    B

Just as Stata does with encode, the data are stored as integers, but display as strings in the default output.

You can verify this by using the categorical accessor cat to see the underlying integer. (And for that reason you probably don't want to use 'cat' as a column name.)

x['cat2'].cat.codes

0    0
1    0
2    1

pandas equivalent of Stata's encode

Tags:

python

pandas

stata

LondonRob

2 Answers

unutbu

JohnE

Recent Activity

Donate For Us

pandas equivalent of Stata's encode

Tags:

python

pandas

stata

LondonRob

2 Answers

unutbu

JohnE

Related questions

Recent Activity

Donate For Us