Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: convert categories to numbers

Suppose I have a dataframe with countries that goes as:

cc | temp US | 37.0 CA | 12.0 US | 35.0 AU | 20.0 

I know that there is a pd.get_dummies function to convert the countries to 'one-hot encodings'. However, I wish to convert them to indices instead such that I will get cc_index = [1,2,1,3] instead.

I'm assuming that there is a faster way than using the get_dummies along with a numpy where clause as shown below:

[np.where(x) for x in df.cc.get_dummies().values]

This is somewhat easier to do in R using 'factors' so I'm hoping pandas has something similar.

like image 538
sachinruk Avatar asked Jun 29 '16 01:06

sachinruk


People also ask

How do you convert categorical data to numerical data in pandas?

First, to convert a Categorical column to its numerical codes, you can do this easier with: dataframe['c']. cat. codes . Further, it is possible to select automatically all columns with a certain dtype in a dataframe using select_dtypes .

How do you convert categorical data to numeric data?

We will be using . LabelEncoder() from sklearn library to convert categorical data to numerical data. We will use function fit_transform() in the process.

How do I convert items to numeric in pandas?

to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric(). This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.


1 Answers

First, change the type of the column:

df.cc = pd.Categorical(df.cc) 

Now the data look similar but are stored categorically. To capture the category codes:

df['code'] = df.cc.cat.codes 

Now you have:

   cc  temp  code 0  US  37.0     2 1  CA  12.0     1 2  US  35.0     2 3  AU  20.0     0 

If you don't want to modify your DataFrame but simply get the codes:

df.cc.astype('category').cat.codes 

Or use the categorical column as an index:

df2 = pd.DataFrame(df.temp) df2.index = pd.CategoricalIndex(df.cc) 
like image 112
John Zwinck Avatar answered Sep 19 '22 04:09

John Zwinck