In Pandas there is get_dummies
method that one-hot encodes categorical variable. Now I want to do label smoothing as described in section 7.5.1 of Deep Learning book:
Label smoothing regularizes a model based on a softmax with k output values by replacing the hard 0 and 1 classification targets with targets of
eps / k
and1 - (k - 1) / k * eps
, respectively.
What would be the most efficient and/or elegant way to do label smothing in Pandas dataframe?
First, lets use much simpler equation (ϵ
denotes how much probability mass you move from "true label" and distribute to all remaining ones).
1 -> 1 - ϵ
0 -> ϵ / (k-1)
You can simply use nice mathematical property of the above, since all you have to do is
x -> x * (1 - ϵ) + (1-x) * ϵ / (k-1)
thus if your dummy columns are a, b, c, d
just do
indices = ['a', 'b', 'c', 'd']
eps = 0.1
df[indices] = df[indices] * (1 - eps) + (1-df[indices]) * eps / (len(indices) - 1)
which for
>>> df
a b c d
0 1 0 0 0
1 0 1 0 0
2 0 0 0 1
3 1 0 0 0
4 0 1 0 0
5 0 0 1 0
returns
a b c d
0 0.900000 0.033333 0.033333 0.033333
1 0.033333 0.900000 0.033333 0.033333
2 0.033333 0.033333 0.033333 0.900000
3 0.900000 0.033333 0.033333 0.033333
4 0.033333 0.900000 0.033333 0.033333
5 0.033333 0.033333 0.900000 0.033333
as expected.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With