Pandas: get_dummies vs categorical

Tags:

I have a dataset which has a few columns with categorical data.

I've been using the Categorical function to replace categorical values with numerical ones.

data[column] = pd.Categorical.from_array(data[column]).codes

I've recently ran across the pandas.get_dummies function. Are these interchangeable? Is there an advantage of using one over the other?

875

asked Mar 23 '15 22:03

sapo_cosmico

1 Answers

Why are you converting the categorical datas to integers? I don't believe you save memory if that is your goal.

df = pd.DataFrame({'cat': pd.Categorical(['a', 'a', 'a', 'b', 'b', 'c'])})
df2 = pd.DataFrame({'cat': [1, 1, 1, 2, 2, 3]})

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 0 to 5
Data columns (total 1 columns):
cat    6 non-null category
dtypes: category(1)
memory usage: 78.0 bytes

>>> df2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 0 to 5
Data columns (total 1 columns):
cat    6 non-null int64
dtypes: int64(1)
memory usage: 96.0 bytes

The categorical codes are just integer values for the unique items in the given category. By contrast, get_dummies returns a new column for each unique item. The value in the column indicates whether or not the record has that attribute.

>>> pd.core.reshape.get_dummies(df)
Out[30]: 
   cat_a  cat_b  cat_c
0      1      0      0
1      1      0      0
2      1      0      0
3      0      1      0
4      0      1      0
5      0      0      1

To get the codes directly, you can use:

df['codes'] = [df.cat.codes.to_list()]

139

answered Sep 24 '22 23:09

Alexander

Related questions
                            
                                logging - how to ignore imported module logs?
                            
                                Binary Subtraction - Python
                            
                                Decimal field rounding in WTForms
                            
                                Why Numpy has dimension (n,) instead of (n,1) only [duplicate]
                            
                                Use spatialite extension for SQLite on Windows
                            
                                How to connect event when tab widget is selected?
                            
                                Excessive Latency on CORS AJAX Request to Local WSGI Server in Chrome
                            
                                Iterate through positions of a substring in a string
                            
                                error: invalid command 'build_sphinx'
                            
                                Dynamic login_redirect_url in Django allauth
                            
                                Can't index by timestamp in pandas dataframe
                            
                                Non-blocking solution to the dining philosophers
                            
                                Password authentication fails with complex password
                            
                                Google App Engine 'No module named pwd'
                            
                                sendMessage from outside in autobahn running in separate thread
                            
                                Save breakpoints to file
                            
                                Inconsistent object comparison behaviour when inheriting from dict
                            
                                Jinja 2 Templates: how I check in an if statement whether the boolean is False or None
                            
                                What's the equivalent for while (cin >> var) in python?
                            
                                How to allow unverified packages in requirements.txt

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: get_dummies vs categorical

Tags:

python

pandas

categorical-data

dummy-data

sapo_cosmico

People also ask

1 Answers

Alexander

Recent Activity

Donate For Us