Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Find object by name in deeply nested list

Tags:

r

nested-lists

Problem

I would suppose that this should be a common problem, yet I couldn't find a solution for it:

Let's assume a deeply nested list, such as:

my_list <- list(
  "first_node" = list(
    "group_a" = list(
      "E001" = 1:5,
      "E002" = list(
        "F001" = 6:10,
        "F002" = 11:15
      )
    ),
    "group_b" = list(
      "XY01" = list(
        "Z1" = LETTERS[1:5],
        "Z2" = LETTERS[6:10],
        "Z3" = list(
          "ZZ1" = LETTERS[1],
          "ZZ2" = LETTERS[2],
          "ZZ3" = LETTERS[3]
        )
      ),
      "YZ" = LETTERS[11:15]
    ),
    "group_c" = list(
      "QQQQ" = list(
        "RRRR" = 200:300
      )
    )
  ),
  "second_node" = list(
    "group_d" = list(
      "L1" = 99:101,
      "L2" = 12
    )
  )
)

Desired Output

I want to retrieve elements by their name, that might be located at an unknown level of depth in that list. Importantly, I only want that specific element and it's children, not the parents.

For example, searching my_list for "XY01" should yield:

XY01 = list(
  "Z1" = LETTERS[1:5],
  "Z2" = LETTERS[6:10],
  "Z3" = list(
    "ZZ1" = LETTERS[1],
    "ZZ2" = LETTERS[2],
    "ZZ3" = LETTERS[3]
  )
)

> str(XY01)
List of 3
 $ Z1: chr [1:5] "A" "B" "C" "D" ...
 $ Z2: chr [1:5] "F" "G" "H" "I" ...
 $ Z3:List of 3
  ..$ ZZ1: chr "A"
  ..$ ZZ2: chr "B"
  ..$ ZZ3: chr "C"

Previous attempts

Initially I want to use rapply() to do the job, but it seems that I wouldn't be able to access names() for the current iteration. My second attempt was writing a custom recursive function:

recursive_extract <- function(haystack, needle){

    lapply(names(haystack), function(x){
      if (needle %in% names(haystack[[x]])) {
        return(haystack[[needle]])
      } else {
        recursive_extract(haystack[[x]], needle)
      }
    }) %>% setNames(names(haystack))
}

...which also seems problematic, since lapply() will always give back the same object, even if NULL is returned, so the parental structure follows along.

I've been looking into the purrr and rlist-packages for a convenient function, but it seems that most of them don't support recursion (?).

Bonus Challenge

After extracting the desired element, I would ideally want to choose how many child-levels to return. For instance: desired_func(haystack, needle, get_depth = 1) for the previous example would result in:

XY01 = list(
  "Z1" = LETTERS[1:5],
  "Z2" = LETTERS[6:10]
)

> str(XY01)
List of 2
 $ Z1: chr [1:5] "A" "B" "C" "D" ...
 $ Z2: chr [1:5] "F" "G" "H" "I" ...

Very grateful for help! :)

like image 581
Comfort Eagle Avatar asked Oct 15 '19 17:10

Comfort Eagle


2 Answers

Here's a function that will return the first match if found

find_name <- function(haystack, needle) {
 if (hasName(haystack, needle)) {
   haystack[[needle]]
 } else if (is.list(haystack)) {
   for (obj in haystack) {
     ret <- Recall(obj, needle)
     if (!is.null(ret)) return(ret)
   }
 } else {
   NULL
 }
}

find_name(my_list, "XY01")

We avoid lapply so the loop can break early if found.

The list pruning is really a separate issue. Better to attack that with a different function. This should work

list_prune <- function(list, depth=1) {
  if (!is.list(list)) return(list)
  if (depth>1) {
    lapply(list, list_prune, depth = depth-1)
  } else  {
    Filter(function(x) !is.list(x), list)
  }
}

Then you could do

list_prune(find_name(my_list, "XY01"), 1)

or with pipes

find_name(my_list, "XY01") %>% list_prune(1)
like image 194
MrFlick Avatar answered Nov 07 '22 04:11

MrFlick


We can also use rrapply in the rrapply-package (an extension of base-rapply).

First, look up the position of XY01 in the nested list:

library(rrapply)

(XY01_pos <- rrapply(my_list, 
                    classes = "list",
                    condition = function(x, .xname) .xname == "XY01", 
                    f = function(x, .xpos) .xpos, 
                    how = "flatten")[[1]])
#> [1] 1 2 1

Here, we make use of the .xname and .xpos arguments which evaluate to the name and position of the list element under evaluation. how = "flatten" returns a flattened version of the pruned list, in this case containing only the position of XY01.

Second, return the sublist by ordinary subsetting of the nested list:

str(my_list[[XY01_pos]])
#> List of 3
#>  $ Z1: chr [1:5] "A" "B" "C" "D" ...
#>  $ Z2: chr [1:5] "F" "G" "H" "I" ...
#>  $ Z3:List of 3
#>   ..$ ZZ1: chr "A"
#>   ..$ ZZ2: chr "B"
#>   ..$ ZZ3: chr "C"

For the bonus challenge we can make another call to rrapply applied to the sublist my_list[[XY01]] returning a pruned list containing only nodes that have a depth smaller or equal to a pre-specified maximum depth:

maxdepth <- 1
rrapply(my_list[[XY01_pos]], condition = function(x, .xpos) length(.xpos) <= maxdepth, how = "prune")
#> $Z1
#> [1] "A" "B" "C" "D" "E"
#> 
#> $Z2
#> [1] "F" "G" "H" "I" "J"

Here length(.xpos) evaluates to the depth of the list element under evaluation, so we return only nodes that satisfy length(.xpos) <= maxdepth.


NB: we could also directly return the children of XY01 with a single call to rrapply by setting how = "prune":

str(rrapply(my_list, classes = "list", condition = function(x, .xname) .xname == "XY01", how = "prune"))
#> List of 1
#>  $ first_node:List of 1
#>   ..$ group_b:List of 1
#>   .. ..$ XY01:List of 3
#>   .. .. ..$ Z1: chr [1:5] "A" "B" "C" "D" ...
#>   .. .. ..$ Z2: chr [1:5] "F" "G" "H" "I" ...
#>   .. .. ..$ Z3:List of 3
#>   .. .. .. ..$ ZZ1: chr "A"
#>   .. .. .. ..$ ZZ2: chr "B"
#>   .. .. .. ..$ ZZ3: chr "C"

But this would contain the complete node paths starting from the root instead of starting from the XY01 node.

like image 38
Joris C. Avatar answered Nov 07 '22 03:11

Joris C.