Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - How to both unlist and concatenate

Using lapply, I've fed a vector of inputs into a function that for each input returns a list of two vectors - possible nth-grams and their probabilities. I end up with a list of lists (lol) with this structure:

> str(lol)
List of 3
 $ :List of 2
  ..$ np1  : chr [1:7] "a" "years" "the" "my" ...
  ..$ probs: num [1:7] 0.1481 0.1357 0.0841 0.0698 0.0522 ...
 $ :List of 2
  ..$ np1  : chr [1:167] "the" "a" "my" "years" ...
  ..$ probs: num [1:167] 0.2745 0.0924 0.0605 0.0437 0.0334 ...
 $ :List of 2
  ..$ np1  : chr [1:9493] "the" "a" "my" "this" ...
  ..$ probs: num [1:9493] 0.267 0.0777 0.0239 0.0169 0.0158 ...

But what I'm aiming for is a single list in which all vectors $np1 are concatenated and all $probs vectors are as well. I tried using unlist(..., recursive = F) to get the list of two vectors, and it's gotten me closer to what I'm looking for than using unlist without the recursive flag.

> str(unlist(lapply(inputs.list, function(x){...}), recursive = F))
List of 6
 $ np1  : chr [1:7] "a" "years" "the" "my" ...
 $ probs: num [1:7] 0.1481 0.1357 0.0841 0.0698 0.0522 ...
 $ np1  : chr [1:167] "the" "a" "my" "years" ...
 $ probs: num [1:167] 0.2745 0.0924 0.0605 0.0437 0.0334 ...
 $ np1  : chr [1:9493] "the" "a" "my" "this" ...
 $ probs: num [1:9493] 0.267 0.0777 0.0239 0.0169 0.0158 ...

But not quite there...

Is there a method that would help me futher consolidate the flatten list into a list of only two vectors as described?

Here a reproducible example to work with:

example1 <- list("time in"=list(np1=c("the", "a", "my", "years"), probs=c(0.2745, 0.0924, 0.0605, 0.0437)),"in"=list(np1=c("the", "a", "my", "this"), probs=c(0.267, 0.0777, 0.0239, 0.0169)))
> str(example1)
List of 2
 $ time in:List of 2
  ..$ np1  : chr [1:4] "the" "a" "my" "years"
  ..$ probs: num [1:4] 0.2745 0.0924 0.0605 0.0437
 $ in     :List of 2
  ..$ np1  : chr [1:4] "the" "a" "my" "this"
  ..$ probs: num [1:4] 0.267 0.0777 0.0239 0.0169
like image 789
Conner M. Avatar asked May 11 '19 23:05

Conner M.


3 Answers

Two lists can be combined in your desired way with Map, as in

Map(c, example1[[1]], example1[[2]])
# $np1
# [1] "the"   "a"     "my"    "years" "the"   "a"     "my"    "this" 
#
# $probs
# [1] 0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239 0.0169

So, as to merge the whole list of lists we may do

Reduce(function(...) Map(c, ...), example1[c(1, 1, 2)])
# $np1
#  [1] "the"   "a"     "my"    "years" "the"   "a"     "my"    "years" "the"   "a"     "my"    "this" 
#
# $probs
#  [1] 0.2745 0.0924 0.0605 0.0437 0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239 0.0169

where I purposefully made the input of length 3 as to demonstrate the functionality. In your case we need

Reduce(function(...) Map(c, ...), lol)
like image 90
Julius Vainora Avatar answered Nov 16 '22 13:11

Julius Vainora


Here's a solution using purrr:

library(tidyverse)

transpose(example1) %>% map(flatten) %>% map(unlist)

Output:

$np1
[1] "the"   "a"     "my"    "years" "the"   "a"     "my"    "this" 

$probs
[1] 0.2745 0.0924 0.0605 0.0437 0.2670 0.0777 0.0239 0.0169
like image 34
andrew_reece Avatar answered Nov 16 '22 13:11

andrew_reece


Here is an "unlist" solution that is similar to what you were working on. It relies on the vectors you are interested in always alternating (e.g., it is always nth and then probs. Good luck and let me know if it doesn't work for you!

unlist_ed <- unlist(example1, recursive = F)

list(
  np1 = unlist(unlist_ed[c(T, F)]),
  probs = unlist(unlist_ed[c(F, T)])
)

$np1
time in.np11 time in.np12 time in.np13 time in.np14      in.np11      in.np12      in.np13      in.np14 
       "the"          "a"         "my"      "years"        "the"          "a"         "my"       "this" 

$probs
time in.probs1 time in.probs2 time in.probs3 time in.probs4      in.probs1      in.probs2      in.probs3 
        0.2745         0.0924         0.0605         0.0437         0.2670         0.0777         0.0239 
     in.probs4 
        0.0169 

EDIT: I thought of another solution that relies on the vector names being the same, but it is much faster (not that that is the goal). Wanted to update!

dplyr::bind_rows(example1)
# A tibble: 8 x 2
  np1    probs
  <chr>  <dbl>
1 the   0.274 
2 a     0.0924
3 my    0.0605
4 years 0.0437
5 the   0.267 
6 a     0.0777
7 my    0.0239
8 this  0.0169

Not a perfect benchmark:

example1 <- rapply(example1, function(x) rep(x, 1e4), how = "list")
example1 <- rep(example1, 100)

microbenchmark::microbenchmark(

o1 = {
    Reduce(function(...) Map(c, ...), example1)
  },
  o2 = {
    unlist_ed <- unlist(example1, recursive = F)

    list(
      nth = unlist(unlist_ed[c(T, F)]),
      probs = unlist(unlist_ed[c(F, T)])
    )
  },
  o3 = {
    transpose(example1) %>% map(flatten) %>% map(unlist)
  },
  o4 = {
    binded <- dplyr::bind_rows(example1)

    list(binded$np1,
         binded$probs)
  },
  times = 1

)

Unit: milliseconds
 expr        min         lq       mean     median         uq        max neval
   o1 5022.25495 5022.25495 5022.25495 5022.25495 5022.25495 5022.25495     1
   o2 5146.75265 5146.75265 5146.75265 5146.75265 5146.75265 5146.75265     1
   o3 2491.21422 2491.21422 2491.21422 2491.21422 2491.21422 2491.21422     1
   o4   83.32919   83.32919   83.32919   83.32919   83.32919   83.32919     1
like image 2
Andrew Avatar answered Nov 16 '22 13:11

Andrew