In my dataset I have two categorical columns which I would like to numerate. The two columns both contain countries, some overlap (appear in both columns). I would like to give the same number in column1 and column2 for the same country.
My data looks somewhat like:
import pandas as pd
d = {'col1': ['NL', 'BE', 'FR', 'BE'], 'col2': ['BE', 'NL', 'ES', 'ES']}
df = pd.DataFrame(data=d)
df
Currenty I am transforming the data like:
from sklearn.preprocessing import LabelEncoder
df.apply(LabelEncoder().fit_transform)
However this makes no distinction between FR and ES. Is there another simple way to come to the following output?
o = {'col1': [2,0,1,0], 'col2': [0,2,4,4]}
output = pd.DataFrame(data=o)
output
To convert your categorical variables to dummy variables in Python you c an use Pandas get_dummies() method. For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) .
To convert category variables to dummy variables in tidyverse, use the spread() method. To do so, use the spread() function with three arguments: key, which is the column to convert into categorical values, in this case, “Reporting Airline”; value, which is the value you want to set the key to (in this case “dummy”);
For checking values we use the pandas select_dtypes function which is used for selecting the data types of variable. Now we have to drop those numerical columns from the dataset and we will use this categorical variable for our use. We only use 3-4 categorical columns from the dataset for applying one-hot encoding.
Here is one way
df.stack().astype('category').cat.codes.unstack()
Out[190]:
col1 col2
0 3 0
1 0 3
2 2 1
3 0 1
Or
s=df.stack()
s[:]=s.factorize()[0]
s.unstack()
Out[196]:
col1 col2
0 0 1
1 1 0
2 2 3
3 1 3
You can fit the LabelEncoder() with the unique values in your dataframe first and then transform.
le = LabelEncoder()
le.fit(pd.concat([df.col1, df.col2]).unique()) # or np.unique(df.values.reshape(-1,1))
df.apply(le.transform)
Out[28]:
col1 col2
0 3 0
1 0 3
2 2 1
3 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With