Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

One hot encode list of vectors

Tags:

r

tidyverse

Is there a quick way to one-hot encode lists of vectors (with different lenghts) in R, preferably using tidyverse?

For example:

vals <- list(a=c(1), b=c(2,3), c=c(1,2))

The wanted result is a wide dataframe:

   1   2   3
a  1   0   0
b  0   1   1
c  1   1   0

Thanks!

like image 213
José Luiz Ferreira Avatar asked Nov 19 '19 08:11

José Luiz Ferreira


People also ask

What are examples of a one-hot encoded vector?

One-hot encoding ensures that machine learning does not assume that higher numbers are more important. For example, the value '8' is bigger than the value '1', but that does not make '8' more important than '1'. The same is true for words: the value 'laughter' is not more important than 'laugh'.

What is the output of one-hot encoding?

One Hot Encoding is a common way of preprocessing categorical features for machine learning models. This type of encoding creates a new binary feature for each possible category and assigns a value of 1 to the feature of each sample that corresponds to its original category.

How do you one-hot encode the column?

For basic one-hot encoding with Pandas you pass your data frame into the get_dummies function. This returns a new dataframe with a column for every "level" of rating that exists, along with either a 1 or 0 specifying the presence of that rating for a given observation.


2 Answers

We can enframe the list and convert them into separate rows, create a dummy column and convert the data into wide-format using pivot_wider.

library(tidyverse)

enframe(vals) %>%
  unnest(value) %>%
  mutate(temp = 1) %>%
  pivot_wider(names_from = value, values_from = temp, values_fill = list(temp = 0))

#  name    `1`   `2`   `3`
#  <chr> <dbl> <dbl> <dbl>
#1 a         1     0     0
#2 b         0     1     1
#3 c         1     1     0
like image 113
Ronak Shah Avatar answered Oct 12 '22 11:10

Ronak Shah


One base R option could be:

t(table(stack(vals)))

   values
ind 1 2 3
  a 1 0 0
  b 0 1 1
  c 1 1 0
like image 26
tmfmnk Avatar answered Oct 12 '22 13:10

tmfmnk