Convert categorical data in pandas dataframe

People also ask

How do you convert categorical data to numerical pandas?

First, to convert a Categorical column to its numerical codes, you can do this easier with: dataframe['c']. cat. codes . Further, it is possible to select automatically all columns with a certain dtype in a dataframe using select_dtypes .

How do you convert categorical data?

We will be using . LabelEncoder() from sklearn library to convert categorical data to numerical data. We will use function fit_transform() in the process.

How do you convert columns to categorical pandas?

astype() method is used to cast a pandas object to a specified dtype. astype() function also provides the capability to convert any suitable existing column to categorical type. DataFrame. astype() function comes very handy when we want to case a particular column data type to another data type.

How do pandas handle categorical data?

The basic strategy is to convert each category value into a new column and assign a 1 or 0 (True/False) value to the column. This has the benefit of not weighting a value improperly. There are many libraries out there that support one-hot encoding but the simplest one is using pandas ' . get_dummies() method.

First, to convert a Categorical column to its numerical codes, you can do this easier with: dataframe['c'].cat.codes.
Further, it is possible to select automatically all columns with a certain dtype in a dataframe using select_dtypes. This way, you can apply above operation on multiple and automatically selected columns.

First making an example dataframe:

Click to copy

In [75]: df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'),  'col3':list('ababb')})

In [76]: df['col2'] = df['col2'].astype('category')

In [77]: df['col3'] = df['col3'].astype('category')

In [78]: df.dtypes
Out[78]:
col1       int64
col2    category
col3    category
dtype: object

Then by using select_dtypes to select the columns, and then applying .cat.codes on each of these columns, you can get the following result:

Click to copy

In [80]: cat_columns = df.select_dtypes(['category']).columns

In [81]: cat_columns
Out[81]: Index([u'col2', u'col3'], dtype='object')

In [83]: df[cat_columns] = df[cat_columns].apply(lambda x: x.cat.codes)

In [84]: df
Out[84]:
   col1  col2  col3
0     1     0     0
1     2     1     1
2     3     2     0
3     4     0     1
4     5     1     1

This works for me:

Click to copy

pandas.factorize( ['B', 'C', 'D', 'B'] )[0]

Output:

Click to copy

[0, 1, 2, 0]

If your concern was only that you making a extra column and deleting it later, just dun use a new column at the first place.

Click to copy

dataframe = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':list('abcab'),  'col3':list('ababb')})
dataframe.col3 = pd.Categorical.from_array(dataframe.col3).codes

You are done. Now as Categorical.from_array is deprecated, use Categorical directly

Click to copy

dataframe.col3 = pd.Categorical(dataframe.col3).codes

If you also need the mapping back from index to label, there is even better way for the same

Click to copy

dataframe.col3, mapping_index = pd.Series(dataframe.col3).factorize()

check below

Click to copy

print(dataframe)
print(mapping_index.get_loc("c"))

Here multiple columns need to be converted. So, one approach i used is ..

Click to copy

for col_name in df.columns:
    if(df[col_name].dtype == 'object'):
        df[col_name]= df[col_name].astype('category')
        df[col_name] = df[col_name].cat.codes

This converts all string / object type columns to categorical. Then applies codes to each type of category.

Related questions
                            
                                Generate temporary file names without creating actual file in Python
                            
                                Are for-loops in pandas really bad? When should I care?
                            
                                Equivalent C++ to Python generator pattern
                            
                                How to set a cell to NaN in a pandas dataframe
                            
                                Learning Python from Ruby; Differences and Similarities
                            
                                Displaying better error message than "No JSON object could be decoded"
                            
                                How to create major and minor gridlines with different linestyles in Python
                            
                                What exactly is Python multiprocessing Module's .join() Method Doing?
                            
                                Iterate over the lines of a string
                            
                                Combining node.js and Python
                            
                                Difference between len() and .__len__()?
                            
                                How to save a list as numpy array in python?
                            
                                In-memory size of a Python structure
                            
                                How do I disable a test using pytest?
                            
                                are there dictionaries in javascript like python?
                            
                                Find the max of two or more columns with pandas
                            
                                Add text to Existing PDF using Python
                            
                                Multiprocessing causes Python to crash and gives an error may have been in progress in another thread when fork() was called
                            
                                How do I add tab completion to the Python shell?
                            
                                How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert categorical data in pandas dataframe

Tags:

python

pandas

People also ask

Recent Activity

Donate For Us