Is there a quick way to one-hot encode lists of vectors (with different lenghts) in R, preferably using tidyverse? For example: <pre class="prettyprint"><code>vals <- list(a=c(1), b=c(2,3), c=c(1,2)) </code></pre> The wanted result is a wide dataframe: <pre class="prettyprint"><code> 1 2 3 a 1 0 0 b 0 1 1 c 1 1 0 </code></pre> Thanks!

We can <code>enframe</code> the list and convert them into separate rows, create a dummy column and convert the data into wide-format using <code>pivot_wider</code>. <pre class="prettyprint"><code>library(tidyverse) enframe(vals) %>% unnest(value) %>% mutate(temp = 1) %>% pivot_wider(names_from = value, values_from = temp, values_fill = list(temp = 0)) # name `1` `2` `3` # <chr> <dbl> <dbl> <dbl> #1 a 1 0 0 #2 b 0 1 1 #3 c 1 1 0 </code></pre>

One <code>base R</code> option could be: <pre class="prettyprint"><code>t(table(stack(vals))) values ind 1 2 3 a 1 0 0 b 0 1 1 c 1 1 0 </code></pre>

One hot encode list of vectors

Tags:

r

tidyverse

Is there a quick way to one-hot encode lists of vectors (with different lenghts) in R, preferably using tidyverse?

For example:

vals <- list(a=c(1), b=c(2,3), c=c(1,2))

The wanted result is a wide dataframe:

   1   2   3
a  1   0   0
b  0   1   1
c  1   1   0

Thanks!

213

asked Nov 19 '19 08:11

José Luiz Ferreira

2 Answers

We can enframe the list and convert them into separate rows, create a dummy column and convert the data into wide-format using pivot_wider.

library(tidyverse)

enframe(vals) %>%
  unnest(value) %>%
  mutate(temp = 1) %>%
  pivot_wider(names_from = value, values_from = temp, values_fill = list(temp = 0))

#  name    `1`   `2`   `3`
#  <chr> <dbl> <dbl> <dbl>
#1 a         1     0     0
#2 b         0     1     1
#3 c         1     1     0

113

answered Oct 12 '22 11:10

Ronak Shah

One base R option could be:

t(table(stack(vals)))

   values
ind 1 2 3
  a 1 0 0
  b 0 1 1
  c 1 1 0

answered Oct 12 '22 13:10

tmfmnk

Related questions
                            
                                How to update data in shiny app periodically?
                            
                                dplyr mutate new dynamic variables with case_when
                            
                                NA filling only if "sandwiched" by the same value using dplyr
                            
                                R dplyr: change the row value of columns having an specific name
                            
                                R filterings rows that contain a combination of words
                            
                                r sf package centroid within polygon
                            
                                How to show formatted R output with results='asis' in rmarkdown
                            
                                How to expand ggplot y axis limits to include maximum value
                            
                                What is the opposite function of max.col (R language)
                            
                                Efficient way to fill column with numbers that identify observations with same value in column [duplicate]
                            
                                How to bind rows without losing those with character(0)?
                            
                                Apply color brewer to a single line in ggplot
                            
                                Override horizontal positioning with ggrepel
                            
                                grepl on two vectors element by element
                            
                                R remove duplicate rows keeping those with values
                            
                                How do I use facetting correctly in ggplot geom_tile, while keeping the aspect ratio intact?
                            
                                Chop off the first letter of every variable name [duplicate]
                            
                                Dplyr filter top and bottom rows by value simultaneously on grouped data
                            
                                Compare the words from a data frame and calculate a matrix with the length of the biggest word for each pair
                            
                                How to make a frequency table by class [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With