Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Turn colum containing list into dummies

I have a dataframe with a list of (space-separated) years that I would like to turn into dummies for each year.

Consider the following toy data:

raw <- data.frame(textcol = c("case1", "case2", "case3"), years=c('1996 1997 1998','1997 1999 2000', '1996 1998 2000'))


  textcol          years
1   case1 1996 1997 1998
2   case2 1997 1999 2000
3   case3 1996 1998 2000

I would now like to transform the data frame into this

  textcol `1996` `1997` `1998` `1999` `2000` 
1   case1      1      1      1      0      0
2   case2      0      1      0      1      1
3   case3      1      0      1      0      1

I tried using separate() and str_split() to no avail. Can someone point me to the right approach?

like image 535
Ivo Avatar asked Oct 13 '25 11:10

Ivo


2 Answers

Use separate_rows to get each year in a separate row and then use table. (Append %>% as.data.frame.matrix to the pipeline if you want it as a data frame.)

library(tidyr)

tab <- raw %>% separate_rows(years) %>% table

giving:

tab
##        years
## textcol 1996 1997 1998 1999 2000
##   case1    1    1    1    0    0
##   case2    0    1    0    1    1
##   case3    1    0    1    0    1

We can display this as a graph. Convert tab to an igraph, g. Then create a custom layout, lay, to display the vertices in order as the usual bipartite layout in igraph tries to reorder them to minimize crossings. Finally plot it.

library(igraph)

g <- graph_from_incidence_matrix(tab)
lay <- with(as.data.frame(layout_as_bipartite(g)), 
  cbind(ave(V1, V2, FUN = sort), V2))
plot(g, layout = lay, vertex.size = 2)

screenshot

like image 76
G. Grothendieck Avatar answered Oct 15 '25 00:10

G. Grothendieck


Use separate_rows with pivot_wider:

library(tidyverse)
raw %>% 
  separate_rows(years) %>% 
  mutate(value = 1) %>% 
  pivot_wider(textcol, names_from = years, values_from = value, values_fill = 0)

# A tibble: 3 x 6
  textcol `1996` `1997` `1998` `1999` `2000`
  <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 case1        1      1      1      0      0
2 case2        0      1      0      1      1
3 case3        1      0      1      0      1
like image 45
Maël Avatar answered Oct 15 '25 01:10

Maël