Say I have a nested list of vectors.
lst1 <- list(`A`=c(a=1,b=1), `B`=c(a=1), `C`=c(b=1), `D`=c(a=1,b=1,c=1))
lst2 <- list(`A`=c(b=1), `B`=c(a=1,b=1), `C`=c(a=1,c=1), `D`=c(a=1,c=1))
lstX <- list(lst1, lst2)
As seen, each vector A,B,C,D
occur twice with a,b,c
present in different frequencies.
How would be the most efficient way of flatten the lists so that a,b,c
is summed, or averaged over A,B,C,D
across the nested lists, as seen below. The real list has several hundreds of thousands of nested lists.
#summed
a b c
A 1 2 NA
B 2 1 NA
C 1 1 1
D 2 1 2
#averaged
a b c
A 0.5 1 NA
B 1 0.5 NA
C 0.5 0.5 0.5
D 1 0.5 1
Provide two arguments to the sum() method: my_list and an empty list (i.e. [ ] ). sum() combines my_list and [ ] to produce a flattened list.
Flattening a list of lists entails converting a 2D list into a 1D list by un-nesting each list item stored in the list of lists - i.e., converting [[1, 2, 3], [4, 5, 6], [7, 8, 9]] into [1, 2, 3, 4, 5, 6, 7, 8, 9] .
Here's a simple base R solution (which will return 0
instead of NA
s (not sure if good enough)
temp <- unlist(lstX)
res <- data.frame(do.call(rbind, strsplit(names(temp), "\\.")), value = temp)
Sums
xtabs(value ~ X1 + X2, res)
# X2
# X1 a b c
# A 1 2 0
# B 2 1 0
# C 1 1 1
# D 2 1 2
Means
xtabs(value ~ X1 + X2, res) / length(lstX)
# X2
# X1 a b c
# A 0.5 1.0 0.0
# B 1.0 0.5 0.0
# C 0.5 0.5 0.5
# D 1.0 0.5 1.0
Alternatively, more flexible data.table
solution
library(data.table) #V1.9.6+
temp <- unlist(lstX)
res <- data.table(names(temp))[, tstrsplit(V1, "\\.")][, value := temp]
Sums
dcast(res, V1 ~ V2, sum, value.var = "value", fill = NA)
# V1 a b c
# 1: A 1 2 NA
# 2: B 2 1 NA
# 3: C 1 1 1
# 4: D 2 1 2
Means
dcast(res, V1 ~ V2, function(x) sum(x)/length(lstX), value.var = "value", fill = NA)
# V1 a b c
# 1: A 0.5 1.0 NA
# 2: B 1.0 0.5 NA
# 3: C 0.5 0.5 0.5
# 4: D 1.0 0.5 1.0
In general, you can use pretty much any function with dcast
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With