I'm trying to do some one-hot encoding with pandas. The input data looks like this:
import pandas as pd
# input data
df = pd.DataFrame({
"pid" : [1, 1, 2, 3],
"code" : ["a", "b", "b", "c"],
"time" : [0, 0, 1, 0]
})
# two of the columns are categories
df["code"] = df.code.astype("category")
df["time"] = df.time.astype("category")
And I want to one-hot-encode the combination of the columns code
and time
in a manner that produces 6 columns corresponding to all combinations of the categories in each column. In this small example that would be a_0, a_1, b_0, b_1, c_0, and c_1.
# i've tried doing this:
df["code_time"] = df.code.astype("str") + df.time.astype("str")
del df["code"]
del df["time"]
pd.get_dummies(df, columns=["code_time"])
But that wont give me the a_1 and c_1 combinations since they do not appear in the data. Is there any straightforward method to produce my desired output:
df_out = pd.DataFrame({
"pid" : [1, 2, 3]
"a_0" : [1, 0, 0]
"a_1" : [0, 0, 0]
"b_0" : [1, 0, 0]
"b_1" : [0, 1, 0]
"c_0" : [0, 0, 1]
"c_1" : [0, 0, 0]
})
A straightforward solution is to create the full product:
from itertools import product
code_times = [f'{x}_{y}' for x,y in product(df['code'].unique(), df['time'].unique())]
data = df.code.astype("str") + '_'+ df.time.astype("str")
# use crosstab for get_dummies() + sum by id
pd.crosstab(df['pid'], data).reindex(code_times, fill_value=0, axis=1)
Output:
col_0 a_0 a_1 b_0 b_1 c_0 c_1
pid
1 1 0 1 0 0 0
2 0 0 0 1 0 0
3 0 0 0 0 1 0
You can try pd.MultiIndex.from_product
with reindex after series.str.get_dummies()
followed by groupby
on pid
and max.
idx = pd.MultiIndex.from_product((df['code'].unique()
,df['time'].astype(str).unique())).map('_'.join)
out = (df['code'].str.cat(df['time'].astype(str),'_').str.get_dummies()
.reindex(idx,axis=1,fill_value=0).groupby(df['pid']).max().reset_index())
print(out)
pid a_0 a_1 b_0 b_1 c_0 c_1
0 1 1 0 1 0 0 0
1 2 0 0 0 1 0 0
2 3 0 0 0 0 1 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With