Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

flatten nested list by averaging vectors

Say I have a nested list of vectors.

lst1 <- list(`A`=c(a=1,b=1), `B`=c(a=1), `C`=c(b=1), `D`=c(a=1,b=1,c=1))
lst2 <- list(`A`=c(b=1), `B`=c(a=1,b=1), `C`=c(a=1,c=1), `D`=c(a=1,c=1))
lstX <- list(lst1, lst2)

As seen, each vector A,B,C,D occur twice with a,b,c present in different frequencies.

How would be the most efficient way of flatten the lists so that a,b,c is summed, or averaged over A,B,C,D across the nested lists, as seen below. The real list has several hundreds of thousands of nested lists.

#summed
  a b  c
A 1 2 NA
B 2 1 NA
C 1 1  1
D 2 1  2

#averaged
  a   b   c
A 0.5 1   NA
B 1   0.5 NA
C 0.5 0.5 0.5
D 1   0.5 1
like image 257
jO. Avatar asked Nov 24 '15 12:11

jO.


People also ask

How do you flatten a nested list in Python?

Provide two arguments to the sum() method: my_list and an empty list (i.e. [ ] ). sum() combines my_list and [ ] to produce a flattened list.

How do you make a list of lists flat listed?

Flattening a list of lists entails converting a 2D list into a 1D list by un-nesting each list item stored in the list of lists - i.e., converting [[1, 2, 3], [4, 5, 6], [7, 8, 9]] into [1, 2, 3, 4, 5, 6, 7, 8, 9] .


1 Answers

Here's a simple base R solution (which will return 0 instead of NAs (not sure if good enough)

temp <- unlist(lstX)
res <- data.frame(do.call(rbind, strsplit(names(temp), "\\.")), value = temp)

Sums

xtabs(value ~ X1 + X2, res)
#    X2
# X1  a b c
# A   1 2 0
# B   2 1 0
# C   1 1 1
# D   2 1 2

Means

xtabs(value ~ X1 + X2, res) / length(lstX)
#    X2
# X1  a   b   c
# A 0.5 1.0 0.0
# B 1.0 0.5 0.0
# C 0.5 0.5 0.5
# D 1.0 0.5 1.0

Alternatively, more flexible data.table solution

library(data.table) #V1.9.6+
temp <- unlist(lstX)
res <- data.table(names(temp))[, tstrsplit(V1, "\\.")][, value := temp]

Sums

dcast(res, V1 ~ V2, sum, value.var = "value", fill = NA)
#    V1 a b  c
# 1:  A 1 2 NA
# 2:  B 2 1 NA
# 3:  C 1 1  1
# 4:  D 2 1  2

Means

dcast(res, V1 ~ V2, function(x) sum(x)/length(lstX), value.var = "value", fill = NA)
#    V1   a   b   c
# 1:  A 0.5 1.0  NA
# 2:  B 1.0 0.5  NA
# 3:  C 0.5 0.5 0.5
# 4:  D 1.0 0.5 1.0

In general, you can use pretty much any function with dcast

like image 82
David Arenburg Avatar answered Oct 31 '22 12:10

David Arenburg