Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unnest list and concatenate in R

Tags:

r

tidyverse

I wish to unnest (flatten?) and concatenate strings (comma separated) of text within a tibble. Example data:

library(tidyverse)

tibble(person = c("Alice", "Bob", "Mary"), 
          score = list(c("Red", "Green", "Blue"), c("Orange", "Green", "Yellow"), "Blue"))

# A tibble: 3 x 2
  person score    
  <chr>  <list>   
1 Alice  <chr [3]>
2 Bob    <chr [3]>
3 Mary   <chr [1]>

Expected output:

tibble(person = c("Alice", "Bob", "Mary"),
       score = c("Red, Green, Blue", "Orange, Green, Yellow", "Blue" ))

# A tibble: 3 x 2
  person score                
  <chr>  <chr>                
1 Alice  Red, Green, Blue     
2 Bob    Orange, Green, Yellow
3 Mary   Blue   

I suspect there's a very neat tidyverse solution to this but I've been unable to find an answer after extensive searching; I suspect I'm using the wrong search terms (unnest/concatentate). A tidyverse solution would be preferred. Thank you.

like image 571
Simon Avatar asked Dec 18 '22 13:12

Simon


2 Answers

You can do:

library(dplyr)
library(purrr)

df %>%
  mutate(score = map_chr(score, toString))

# A tibble: 3 x 2
  person score                
  <chr>  <chr>                
1 Alice  Red, Green, Blue     
2 Bob    Orange, Green, Yellow
3 Mary   Blue                

If you have multiple list columns you can do:

df <- tibble(person = c("Alice", "Bob", "Mary"), 
       score1 = list(c("Red", "Green", "Blue"), c("Orange", "Green", "Yellow"), "Blue"),
       score2 = rev(list(c("Red", "Green", "Blue"), c("Orange", "Green", "Yellow"), "Blue")))

df %>%
  mutate_if(is.list, ~ map_chr(.x, toString))

# A tibble: 3 x 3
  person score1                score2               
  <chr>  <chr>                 <chr>                
1 Alice  Red, Green, Blue      Blue                 
2 Bob    Orange, Green, Yellow Orange, Green, Yellow
3 Mary   Blue                  Red, Green, Blue     
like image 141
Ritchie Sacramento Avatar answered Dec 31 '22 02:12

Ritchie Sacramento


A simple way would be to unnest the data in long format and collapse it by group.

library(dplyr)

df %>%
  tidyr::unnest(score) %>%
  group_by(person) %>%
  summarise(score = toString(score))

# person score                
#  <chr>  <chr>                
#1 Alice  Red, Green, Blue     
#2 Bob    Orange, Green, Yellow
#3 Mary   Blue        

Other option would be rowwise

df %>% rowwise() %>% mutate(score = toString(score))
like image 31
Ronak Shah Avatar answered Dec 31 '22 03:12

Ronak Shah