I have a list of lists and I want the sub-lists to all have the same length
i.e. to pad them with NA
s if needed so they all reach the length of the longest list.
Mock example
list1 <- list(1, 2, 3)
list2 <- list(1, 2, 3, 4, 5)
list3 <- list(1, 2, 3, 4, 5, 6)
list_lists <- list(list1, list2, list3)
My best attempt yet
max_length <- max(unlist(lapply (list_lists, FUN = length)))
# returns the length of the longest list
list_lists <- lapply (list_lists, function (x) length (x) <- max_length)
Problem, it is replacing all my sub-lists into an integer = max_length...
list_lists [[1]]
> [1] 6
Can someone help?
In lists, NULL
would seem more appropriate than NA
, and could be added with vector
:
list_lists <- list(list(1, 2, 3),
list(1, 2, 3, 4, 5),
list(1, 2, 3, 4, 5, 6))
list_lists2 <- Map(function(x, y){c(x, vector('list', length = y))},
list_lists,
max(lengths(list_lists)) - lengths(list_lists))
str(list_lists2)
#> List of 3
#> $ :List of 6
#> ..$ : num 1
#> ..$ : num 2
#> ..$ : num 3
#> ..$ : NULL
#> ..$ : NULL
#> ..$ : NULL
#> $ :List of 6
#> ..$ : num 1
#> ..$ : num 2
#> ..$ : num 3
#> ..$ : num 4
#> ..$ : num 5
#> ..$ : NULL
#> $ :List of 6
#> ..$ : num 1
#> ..$ : num 2
#> ..$ : num 3
#> ..$ : num 4
#> ..$ : num 5
#> ..$ : num 6
If you really want NA
s, just change vector
to rep
:
list_lists3 <- Map(function(x, y){c(x, rep(NA, y))},
list_lists,
max(lengths(list_lists)) - lengths(list_lists))
str(list_lists3)
#> List of 3
#> $ :List of 6
#> ..$ : num 1
#> ..$ : num 2
#> ..$ : num 3
#> ..$ : logi NA
#> ..$ : logi NA
#> ..$ : logi NA
#> $ :List of 6
#> ..$ : num 1
#> ..$ : num 2
#> ..$ : num 3
#> ..$ : num 4
#> ..$ : num 5
#> ..$ : logi NA
#> $ :List of 6
#> ..$ : num 1
#> ..$ : num 2
#> ..$ : num 3
#> ..$ : num 4
#> ..$ : num 5
#> ..$ : num 6
Note the types in the latter won't match up unless you specify NA_real_
or coerce NA
to match the type of x
.
Try this (where ls
is your list):
lapply(lapply(sapply(ls, unlist), "length<-", max(lengths(ls))), as.list)
Here is your code fixed.
The function should return x
, not length(x)
.
Also, I used vectors, not lists for clarity.
list1 <- c(1, 2, 3)
list2 <- c(1, 2, 3, 4, 5)
list3 <- c(1, 2, 3, 4, 5, 6)
list_lists <- list(list1, list2, list3)
max_length <- max(unlist(lapply (list_lists, FUN = length)))
list_lists <- lapply (list_lists, function (x) {length (x) <- max_length;x})
# [[1]]
# [1] 1 2 3 NA NA NA
#
# [[2]]
# [1] 1 2 3 4 5 NA
#
# [[3]]
# [1] 1 2 3 4 5 6
For original lists the result is:
# [[1]]
# [[1]][[1]]
# [1] 1
#
# [[1]][[2]]
# [1] 2
#
# [[1]][[3]]
# [1] 3
#
# [[1]][[4]]
# NULL
#
# [[1]][[5]]
# NULL
#
# [[1]][[6]]
# NULL
#
#
# [[2]]
# [[2]][[1]]
# [1] 1
#
# [[2]][[2]]
# [1] 2
#
# [[2]][[3]]
# [1] 3
#
# [[2]][[4]]
# [1] 4
#
# [[2]][[5]]
# [1] 5
#
# [[2]][[6]]
# NULL
#
#
# [[3]]
# [[3]][[1]]
# [1] 1
#
# [[3]][[2]]
# [1] 2
#
# [[3]][[3]]
# [1] 3
#
# [[3]][[4]]
# [1] 4
#
# [[3]][[5]]
# [1] 5
#
# [[3]][[6]]
# [1] 6
Try this:
funJoeOld <- function(ls) {
list_length <- sapply(ls, length)
max_length <- max(list_length)
lapply(seq_along(ls), function(x) {
if (list_length[x] < max_length) {
c(ls[[x]], lapply(1:(max_length - list_length[x]), function(y) NA))
} else {
ls[[x]]
}
})
}
funJoeOld(list_lists)[[1]]
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] NA
[[5]]
[1] NA
[[6]]
[1] NA
Just wanted to illuminate how using the right tools in R
makes a huge difference. Although my solution gives correct results, it is very inefficient. By replacing sapply(ls, length)
with lengths
as well as lapply(1:z, function(y) NA)
with as.list(rep(NA, z))
, we obtain almost a 15x speed up. Observe:
funJoeNew <- function(ls) {
list_length <- lengths(ls)
max_length <- max(list_length)
lapply(seq_along(ls), function(x) {
if (list_length[x] < max_length) {
c(ls[[x]], as.list(rep(NA, max_length - list_length[x])))
} else {
ls[[x]]
}
})
}
funAlistaire <- function(ls) {
Map(function(x, y){c(x, rep(NA, y))},
ls,
max(lengths(ls)) - lengths(ls))
}
fun989 <- function(ls) {
lapply(lapply(sapply(ls, unlist), "length<-", max(lengths(ls))), as.list)
}
Compare equality
set.seed(123)
samp_list <- lapply(sample(1000, replace = TRUE), function(x) {lapply(1:x, identity)})
## have to unlist as the NAs in 989 are of the integer
## variety and the NAs in Joe/Alistaire are logical
identical(sapply(fun989(samp_list), unlist), sapply(funJoeNew(samp_list), unlist))
[1] TRUE
identical(funJoeNew(samp_list), funAlistaire(samp_list))
[1] TRUE
Benchmarks
microbenchmark(funJoeOld(samp_list), funJoeNew(samp_list), fun989(samp_list),
funAlistaire(samp_list), times = 30, unit = "relative")
Unit: relative
expr min lq mean median uq max neval cld
funJoeOld(samp_list) 21.825878 23.269846 17.434447 20.803035 18.851403 4.8056784 30 c
funJoeNew(samp_list) 1.827741 1.841071 2.253294 1.667047 1.780324 2.4659653 30 ab
fun989(samp_list) 3.108230 3.563780 3.170320 3.790048 3.888632 0.9890681 30 b
funAli(samp_list) 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 30 a
There are two take aways here:
apply
family of functions makes for
concise and efficient code (as can be seen in @alistaire's and @989's solution).base R
in general can have considerable consequencesIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With