I would suppose that this should be a common problem, yet I couldn't find a solution for it:
Let's assume a deeply nested list, such as:
my_list <- list(
"first_node" = list(
"group_a" = list(
"E001" = 1:5,
"E002" = list(
"F001" = 6:10,
"F002" = 11:15
)
),
"group_b" = list(
"XY01" = list(
"Z1" = LETTERS[1:5],
"Z2" = LETTERS[6:10],
"Z3" = list(
"ZZ1" = LETTERS[1],
"ZZ2" = LETTERS[2],
"ZZ3" = LETTERS[3]
)
),
"YZ" = LETTERS[11:15]
),
"group_c" = list(
"QQQQ" = list(
"RRRR" = 200:300
)
)
),
"second_node" = list(
"group_d" = list(
"L1" = 99:101,
"L2" = 12
)
)
)
I want to retrieve elements by their name, that might be located at an unknown level of depth in that list. Importantly, I only want that specific element and it's children, not the parents.
For example, searching my_list
for "XY01"
should yield:
XY01 = list(
"Z1" = LETTERS[1:5],
"Z2" = LETTERS[6:10],
"Z3" = list(
"ZZ1" = LETTERS[1],
"ZZ2" = LETTERS[2],
"ZZ3" = LETTERS[3]
)
)
> str(XY01)
List of 3
$ Z1: chr [1:5] "A" "B" "C" "D" ...
$ Z2: chr [1:5] "F" "G" "H" "I" ...
$ Z3:List of 3
..$ ZZ1: chr "A"
..$ ZZ2: chr "B"
..$ ZZ3: chr "C"
Initially I want to use rapply()
to do the job, but it seems that I wouldn't be able to access names()
for the current iteration. My second attempt was writing a custom recursive function:
recursive_extract <- function(haystack, needle){
lapply(names(haystack), function(x){
if (needle %in% names(haystack[[x]])) {
return(haystack[[needle]])
} else {
recursive_extract(haystack[[x]], needle)
}
}) %>% setNames(names(haystack))
}
...which also seems problematic, since lapply()
will always give back the same object, even if NULL
is returned, so the parental structure follows along.
I've been looking into the purrr
and rlist
-packages for a convenient function, but it seems that most of them don't support recursion (?).
After extracting the desired element, I would ideally want to choose how many child-levels to return. For instance:
desired_func(haystack, needle, get_depth = 1)
for the previous example would result in:
XY01 = list(
"Z1" = LETTERS[1:5],
"Z2" = LETTERS[6:10]
)
> str(XY01)
List of 2
$ Z1: chr [1:5] "A" "B" "C" "D" ...
$ Z2: chr [1:5] "F" "G" "H" "I" ...
Very grateful for help! :)
Here's a function that will return the first match if found
find_name <- function(haystack, needle) {
if (hasName(haystack, needle)) {
haystack[[needle]]
} else if (is.list(haystack)) {
for (obj in haystack) {
ret <- Recall(obj, needle)
if (!is.null(ret)) return(ret)
}
} else {
NULL
}
}
find_name(my_list, "XY01")
We avoid lapply
so the loop can break early if found.
The list pruning is really a separate issue. Better to attack that with a different function. This should work
list_prune <- function(list, depth=1) {
if (!is.list(list)) return(list)
if (depth>1) {
lapply(list, list_prune, depth = depth-1)
} else {
Filter(function(x) !is.list(x), list)
}
}
Then you could do
list_prune(find_name(my_list, "XY01"), 1)
or with pipes
find_name(my_list, "XY01") %>% list_prune(1)
We can also use rrapply
in the rrapply
-package (an extension of base-rapply
).
First, look up the position of XY01
in the nested list:
library(rrapply)
(XY01_pos <- rrapply(my_list,
classes = "list",
condition = function(x, .xname) .xname == "XY01",
f = function(x, .xpos) .xpos,
how = "flatten")[[1]])
#> [1] 1 2 1
Here, we make use of the .xname
and .xpos
arguments which evaluate to the name and position of the list element under evaluation. how = "flatten"
returns a flattened version of the pruned list, in this case containing only the position of XY01
.
Second, return the sublist by ordinary subsetting of the nested list:
str(my_list[[XY01_pos]])
#> List of 3
#> $ Z1: chr [1:5] "A" "B" "C" "D" ...
#> $ Z2: chr [1:5] "F" "G" "H" "I" ...
#> $ Z3:List of 3
#> ..$ ZZ1: chr "A"
#> ..$ ZZ2: chr "B"
#> ..$ ZZ3: chr "C"
For the bonus challenge we can make another call to rrapply
applied to the sublist my_list[[XY01]]
returning a pruned list containing only nodes that have a depth smaller or equal to a pre-specified maximum depth:
maxdepth <- 1
rrapply(my_list[[XY01_pos]], condition = function(x, .xpos) length(.xpos) <= maxdepth, how = "prune")
#> $Z1
#> [1] "A" "B" "C" "D" "E"
#>
#> $Z2
#> [1] "F" "G" "H" "I" "J"
Here length(.xpos)
evaluates to the depth of the list element under evaluation, so we return only nodes that satisfy length(.xpos) <= maxdepth
.
NB: we could also directly return the children of XY01
with a single call to rrapply
by setting how = "prune"
:
str(rrapply(my_list, classes = "list", condition = function(x, .xname) .xname == "XY01", how = "prune"))
#> List of 1
#> $ first_node:List of 1
#> ..$ group_b:List of 1
#> .. ..$ XY01:List of 3
#> .. .. ..$ Z1: chr [1:5] "A" "B" "C" "D" ...
#> .. .. ..$ Z2: chr [1:5] "F" "G" "H" "I" ...
#> .. .. ..$ Z3:List of 3
#> .. .. .. ..$ ZZ1: chr "A"
#> .. .. .. ..$ ZZ2: chr "B"
#> .. .. .. ..$ ZZ3: chr "C"
But this would contain the complete node paths starting from the root instead of starting from the XY01
node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With