Is there a quick way to one-hot encode lists of vectors (with different lenghts) in R, preferably using tidyverse?
For example:
vals <- list(a=c(1), b=c(2,3), c=c(1,2))
The wanted result is a wide dataframe:
1 2 3
a 1 0 0
b 0 1 1
c 1 1 0
Thanks!
One-hot encoding ensures that machine learning does not assume that higher numbers are more important. For example, the value '8' is bigger than the value '1', but that does not make '8' more important than '1'. The same is true for words: the value 'laughter' is not more important than 'laugh'.
One Hot Encoding is a common way of preprocessing categorical features for machine learning models. This type of encoding creates a new binary feature for each possible category and assigns a value of 1 to the feature of each sample that corresponds to its original category.
For basic one-hot encoding with Pandas you pass your data frame into the get_dummies function. This returns a new dataframe with a column for every "level" of rating that exists, along with either a 1 or 0 specifying the presence of that rating for a given observation.
We can enframe
the list and convert them into separate rows, create a dummy column and convert the data into wide-format using pivot_wider
.
library(tidyverse)
enframe(vals) %>%
unnest(value) %>%
mutate(temp = 1) %>%
pivot_wider(names_from = value, values_from = temp, values_fill = list(temp = 0))
# name `1` `2` `3`
# <chr> <dbl> <dbl> <dbl>
#1 a 1 0 0
#2 b 0 1 1
#3 c 1 1 0
One base R
option could be:
t(table(stack(vals)))
values
ind 1 2 3
a 1 0 0
b 0 1 1
c 1 1 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With