Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equalizing the lengths of all the lists within a list?

Tags:

list

r

I have a list of lists and I want the sub-lists to all have the same length

i.e. to pad them with NAs if needed so they all reach the length of the longest list.

Mock example

list1 <- list(1, 2, 3)
list2 <- list(1, 2, 3, 4, 5)
list3 <- list(1, 2, 3, 4, 5, 6)

list_lists <- list(list1, list2, list3)

My best attempt yet

max_length <- max(unlist(lapply (list_lists, FUN = length))) 
    # returns the length of the longest list

list_lists <- lapply (list_lists, function (x) length (x) <- max_length)

Problem, it is replacing all my sub-lists into an integer = max_length...

list_lists [[1]]
> [1] 6

Can someone help?

like image 267
francoiskroll Avatar asked Apr 14 '17 16:04

francoiskroll


4 Answers

In lists, NULL would seem more appropriate than NA, and could be added with vector:

list_lists <- list(list(1, 2, 3),
                   list(1, 2, 3, 4, 5),
                   list(1, 2, 3, 4, 5, 6))


list_lists2 <- Map(function(x, y){c(x, vector('list', length = y))}, 
                   list_lists, 
                   max(lengths(list_lists)) - lengths(list_lists))

str(list_lists2)
#> List of 3
#>  $ :List of 6
#>   ..$ : num 1
#>   ..$ : num 2
#>   ..$ : num 3
#>   ..$ : NULL
#>   ..$ : NULL
#>   ..$ : NULL
#>  $ :List of 6
#>   ..$ : num 1
#>   ..$ : num 2
#>   ..$ : num 3
#>   ..$ : num 4
#>   ..$ : num 5
#>   ..$ : NULL
#>  $ :List of 6
#>   ..$ : num 1
#>   ..$ : num 2
#>   ..$ : num 3
#>   ..$ : num 4
#>   ..$ : num 5
#>   ..$ : num 6

If you really want NAs, just change vector to rep:

list_lists3 <- Map(function(x, y){c(x, rep(NA, y))}, 
                   list_lists, 
                   max(lengths(list_lists)) - lengths(list_lists))

str(list_lists3)
#> List of 3
#>  $ :List of 6
#>   ..$ : num 1
#>   ..$ : num 2
#>   ..$ : num 3
#>   ..$ : logi NA
#>   ..$ : logi NA
#>   ..$ : logi NA
#>  $ :List of 6
#>   ..$ : num 1
#>   ..$ : num 2
#>   ..$ : num 3
#>   ..$ : num 4
#>   ..$ : num 5
#>   ..$ : logi NA
#>  $ :List of 6
#>   ..$ : num 1
#>   ..$ : num 2
#>   ..$ : num 3
#>   ..$ : num 4
#>   ..$ : num 5
#>   ..$ : num 6

Note the types in the latter won't match up unless you specify NA_real_ or coerce NA to match the type of x.

like image 36
alistaire Avatar answered Nov 20 '22 02:11

alistaire


Try this (where ls is your list):

lapply(lapply(sapply(ls, unlist), "length<-", max(lengths(ls))), as.list)
like image 81
989 Avatar answered Nov 20 '22 02:11

989


Here is your code fixed. The function should return x, not length(x). Also, I used vectors, not lists for clarity.

list1 <- c(1, 2, 3)
list2 <- c(1, 2, 3, 4, 5)
list3 <- c(1, 2, 3, 4, 5, 6)

list_lists <- list(list1, list2, list3)

max_length <- max(unlist(lapply (list_lists, FUN = length))) 

list_lists <- lapply (list_lists, function (x) {length (x) <- max_length;x})

# [[1]]
# [1]  1  2  3 NA NA NA
# 
# [[2]]
# [1]  1  2  3  4  5 NA
# 
# [[3]]
# [1] 1 2 3 4 5 6

For original lists the result is:

# [[1]]
# [[1]][[1]]
# [1] 1
# 
# [[1]][[2]]
# [1] 2
# 
# [[1]][[3]]
# [1] 3
# 
# [[1]][[4]]
# NULL
# 
# [[1]][[5]]
# NULL
# 
# [[1]][[6]]
# NULL
# 
# 
# [[2]]
# [[2]][[1]]
# [1] 1
# 
# [[2]][[2]]
# [1] 2
# 
# [[2]][[3]]
# [1] 3
# 
# [[2]][[4]]
# [1] 4
# 
# [[2]][[5]]
# [1] 5
# 
# [[2]][[6]]
# NULL
# 
# 
# [[3]]
# [[3]][[1]]
# [1] 1
# 
# [[3]][[2]]
# [1] 2
# 
# [[3]][[3]]
# [1] 3
# 
# [[3]][[4]]
# [1] 4
# 
# [[3]][[5]]
# [1] 5
# 
# [[3]][[6]]
# [1] 6
like image 31
Andrey Shabalin Avatar answered Nov 20 '22 03:11

Andrey Shabalin


Try this:

funJoeOld <- function(ls) {
    list_length <- sapply(ls, length)
    max_length <- max(list_length)

    lapply(seq_along(ls), function(x) {
        if (list_length[x] < max_length) {
            c(ls[[x]], lapply(1:(max_length - list_length[x]), function(y) NA))
        } else {
            ls[[x]]
        }
    })
}

funJoeOld(list_lists)[[1]]
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] NA

[[5]]
[1] NA

[[6]]
[1] NA


Edit

Just wanted to illuminate how using the right tools in R makes a huge difference. Although my solution gives correct results, it is very inefficient. By replacing sapply(ls, length) with lengths as well as lapply(1:z, function(y) NA) with as.list(rep(NA, z)), we obtain almost a 15x speed up. Observe:

funJoeNew <- function(ls) {
    list_length <- lengths(ls)
    max_length <- max(list_length)

    lapply(seq_along(ls), function(x) {
        if (list_length[x] < max_length) {
            c(ls[[x]], as.list(rep(NA, max_length - list_length[x])))
        } else {
            ls[[x]]
        }
    })
}

funAlistaire <- function(ls) {
    Map(function(x, y){c(x, rep(NA, y))}, 
        ls, 
        max(lengths(ls)) - lengths(ls))
}

fun989 <- function(ls) {
    lapply(lapply(sapply(ls, unlist), "length<-", max(lengths(ls))), as.list)
}

Compare equality

set.seed(123)
samp_list <- lapply(sample(1000, replace = TRUE), function(x) {lapply(1:x, identity)})

## have to unlist as the NAs in 989 are of the integer
## variety and the NAs in Joe/Alistaire are logical
identical(sapply(fun989(samp_list), unlist), sapply(funJoeNew(samp_list), unlist))
[1] TRUE

identical(funJoeNew(samp_list), funAlistaire(samp_list))
[1] TRUE

Benchmarks

microbenchmark(funJoeOld(samp_list), funJoeNew(samp_list), fun989(samp_list),
                             funAlistaire(samp_list), times = 30, unit = "relative")
Unit: relative
                expr       min        lq      mean    median        uq       max neval cld
funJoeOld(samp_list) 21.825878 23.269846 17.434447 20.803035 18.851403 4.8056784    30   c
funJoeNew(samp_list)  1.827741  1.841071  2.253294  1.667047  1.780324 2.4659653    30 ab 
   fun989(samp_list)  3.108230  3.563780  3.170320  3.790048  3.888632 0.9890681    30  b 
   funAli(samp_list)  1.000000  1.000000  1.000000  1.000000  1.000000 1.0000000    30 a  

There are two take aways here:

  1. Having a good understanding of the apply family of functions makes for concise and efficient code (as can be seen in @alistaire's and @989's solution).
  2. Understanding the nuances of base R in general can have considerable consequences
like image 2
Joseph Wood Avatar answered Nov 20 '22 02:11

Joseph Wood