Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Table to nested list

Tags:

r

data.table

I would like to convert:

library(data.table)    
n <- 12
DT <- data.table(
  level1 = rep(paste0("Manu", 1:2), each = n / 2),
  level2 = rep(paste0("Dept", 1:4), each = n / 4),
  level3 = rep(paste0("Store", 1:n))
)
> DT
level1 level2  level3
1:  Manu1  Dept1  Store1
2:  Manu1  Dept1  Store2
3:  Manu1  Dept1  Store3
4:  Manu1  Dept2  Store4
5:  Manu1  Dept2  Store5
6:  Manu1  Dept2  Store6
7:  Manu2  Dept3  Store7
8:  Manu2  Dept3  Store8
9:  Manu2  Dept3  Store9
10:  Manu2  Dept4 Store10
11:  Manu2  Dept4 Store11
12:  Manu2  Dept4 Store12

To this:

goal <- list(
  Manu1 = list(
    Dept1 = paste0("Store", 1:(n / 4)),
    Dept2 = paste0("Store", (n/4 + 1):(n / 2))
  ),
  Manu2 = list(
    Dept3 = paste0("Store", (n/2 + 1):(3 * n / 4)),
    Dept4 = paste0("Store", (3 * n / 4 + 1):n)
  )
)
> goal
$Manu1
$Manu1$Dept1
[1] "Store1" "Store2" "Store3"

$Manu1$Dept2
[1] "Store4" "Store5" "Store6"


$Manu2
$Manu2$Dept3
[1] "Store7" "Store8" "Store9"

$Manu2$Dept4
[1] "Store10" "Store11" "Store12"

What's the data.table way to do this?

like image 509
mlegge Avatar asked May 19 '16 22:05

mlegge


3 Answers

Borrowing from @eddi's comment (which requires updating data.table to 1.9.8+):

s = split(DT, by = c('level1', 'level2'), keep.by = FALSE, flatten = FALSE)
rapply(relist(DT[['level3']], s), unname, how="replace")

$Manu1
$Manu1$Dept1
[1] "Store1" "Store2" "Store3"

$Manu1$Dept2
[1] "Store4" "Store5" "Store6"


$Manu2
$Manu2$Dept3
[1] "Store7" "Store8" "Store9"

$Manu2$Dept4
[1] "Store10" "Store11" "Store12"

Computationally, this looks pretty wasteful (iterating over the tree structure three times), but at least it should extend to deeper nesting than two levels (thanks to split.data.table in 1.9.8+).

like image 160
Frank Avatar answered Nov 07 '22 20:11

Frank


The environment can be made stricter using assign and friends instead of the global <<-, but here's a quick and dirty way of doing it:

l = list()

DT[, {l[[level1]][[level2]] <<- c(level3); NULL}, by = .(level1, level2)]

l
#$Manu1
#$Manu1$Dept1
#[1] "Store1" "Store2" "Store3"
#
#$Manu1$Dept2
#[1] "Store4" "Store5" "Store6"
#
#
#$Manu2
#$Manu2$Dept3
#[1] "Store7" "Store8" "Store9"
#
#$Manu2$Dept4
#[1] "Store10" "Store11" "Store12"
like image 39
eddi Avatar answered Nov 07 '22 20:11

eddi


You can do this with dlply function from plyr package:

library(plyr)
res <- dlply(DT, .(level1), function(dt) {
  dlply(dt, .(level2), function(dt) {return (unique(dt$level3))})
})
like image 3
Bulat Avatar answered Nov 07 '22 19:11

Bulat