In my dataset I have two categorical columns which I would like to numerate. The two columns both contain countries, some overlap (appear in both columns). I would like to give the same number in column1 and column2 for the same country. My data looks somewhat like: <pre class="prettyprint"><code>import pandas as pd d = {'col1': ['NL', 'BE', 'FR', 'BE'], 'col2': ['BE', 'NL', 'ES', 'ES']} df = pd.DataFrame(data=d) df </code></pre> Currenty I am transforming the data like: <pre class="prettyprint"><code>from sklearn.preprocessing import LabelEncoder df.apply(LabelEncoder().fit_transform) </code></pre> However this makes no distinction between FR and ES. Is there another simple way to come to the following output? <pre class="prettyprint"><code>o = {'col1': [2,0,1,0], 'col2': [0,2,4,4]} output = pd.DataFrame(data=o) output </code></pre>

Here is one way <pre class="prettyprint"><code>df.stack().astype('category').cat.codes.unstack() Out[190]: col1 col2 0 3 0 1 0 3 2 2 1 3 0 1 </code></pre> Or <pre class="prettyprint"><code>s=df.stack() s[:]=s.factorize()[0] s.unstack() Out[196]: col1 col2 0 0 1 1 1 0 2 2 3 3 1 3 </code></pre>

You can fit the LabelEncoder() with the unique values in your dataframe first and then transform. <pre class="prettyprint"><code>le = LabelEncoder() le.fit(pd.concat([df.col1, df.col2]).unique()) # or np.unique(df.values.reshape(-1,1)) df.apply(le.transform) Out[28]: col1 col2 0 3 0 1 0 3 2 2 1 3 0 1 </code></pre>

Transform multiple categorical columns

Tags:

python

python-3.x

pandas

scikit-learn

categorical-data

In my dataset I have two categorical columns which I would like to numerate. The two columns both contain countries, some overlap (appear in both columns). I would like to give the same number in column1 and column2 for the same country.

My data looks somewhat like:

import pandas as pd

d = {'col1': ['NL', 'BE', 'FR', 'BE'], 'col2': ['BE', 'NL', 'ES', 'ES']}
df = pd.DataFrame(data=d)
df

Currenty I am transforming the data like:

from sklearn.preprocessing import LabelEncoder
df.apply(LabelEncoder().fit_transform)

However this makes no distinction between FR and ES. Is there another simple way to come to the following output?

o = {'col1': [2,0,1,0], 'col2': [0,2,4,4]}
output = pd.DataFrame(data=o)
output

965

asked Nov 12 '19 15:11

Tox

2 Answers

Here is one way

df.stack().astype('category').cat.codes.unstack()
Out[190]: 
   col1  col2
0     3     0
1     0     3
2     2     1
3     0     1

s=df.stack()
s[:]=s.factorize()[0]
s.unstack()
Out[196]: 
   col1  col2
0     0     1
1     1     0
2     2     3
3     1     3

113

answered Oct 21 '22 00:10

BENY

You can fit the LabelEncoder() with the unique values in your dataframe first and then transform.

le = LabelEncoder()
le.fit(pd.concat([df.col1, df.col2]).unique()) # or np.unique(df.values.reshape(-1,1))

df.apply(le.transform)
Out[28]: 
   col1  col2
0     3     0
1     0     3
2     2     1
3     0     1

answered Oct 20 '22 23:10

Michael Gardner

Related questions
                            
                                (Easiest) Way to use Python 3.6 and 3.7 on same computer?
                            
                                RuntimeError: OrderedDict mutated during iteration (Python3)
                            
                                Tensorflow——keras model.save() raise NotImplementedError
                            
                                Evaluate Xpath2.0 in python
                            
                                Tensorflow v1.10+ why is an input serving receiver function needed when checkpoints are made without it?
                            
                                Python, run package with `python3.6 -m somepackge.run`
                            
                                ssl module in python is not available Windows 7
                            
                                Django compress error: Invalid input of type: 'CacheKey'
                            
                                Why is ‘==‘ coming before ‘in’ in Python?
                            
                                Replace ones in binary columns with values from another column
                            
                                What does `exit` keyword do in Python3 with Jupyter Notebook?
                            
                                Use columns 1 and 2 to populate column 3
                            
                                Numpy, TypeError: Could not be cast from dtype('<M8[us]') to dtype('<M8[D]')
                            
                                How to apply float precision (type specifier) in a conditional f-string?
                            
                                pylint R1720: Unnecessary "elif" after "raise" (no-else-raise)
                            
                                nan values in loss in keras model
                            
                                Modbus Error: [Invalid Message] Incomplete message received, expected at least 2 bytes (0 received)
                            
                                How to get PyPI to automatically install dependencies [duplicate]
                            
                                How to count the number of occurences before a particular value in dataframe python?
                            
                                What does "import" prefer - .pyd (.so) or .py?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With