How to avoid looping over list after reading from JSON with R

Question

I have a vector of JSON data in R, and with lapply I extract the information:

 list <- lapply(temp, fromJSON)

The structure of the first element of this list looks like this:

str(list[[1]])

List of 4
 $ boundedBy :List of 2
  ..$ type       : chr "Polygon"
  ..$ coordinates:List of 1
  .. ..$ :List of 5
  .. .. ..$ : num [1:2] 89328 208707
  .. .. ..$ : num [1:2] 89333 208707
  .. .. ..$ : num [1:2] 89333 208713
  .. .. ..$ : num [1:2] 89328 208713
  .. .. ..$ : num [1:2] 89328 208707
 $ hnrlbl    : NULL
 $ opndatum  : chr "2011-05-30"
 $ oidn      : chr "2954841"

This works for the first element: list[[1]]$hnrlbl , but how do I do this at once for the whole list? Something like list[[.]]$hnrlbl

jdharrison · Accepted Answer

In this case you could just use list.map from the rlist package:

mylist <- lapply(temp, fromJSON)
library(rlist)
list.map(mylist, hnrlbl)

http://cran.r-project.org/web/packages/rlist/vignettes/Mapping.html

hadley · Answer

I have a helper function that's useful for these scenarios:

pluck <- function(x, name, type) {
  if (missing(type)) {
    lapply(x, .subset2, name)
  } else {
    vapply(x, .subset2, name, FUN.VALUE = type)
  }
}

(This was inspired by underscore & Winston Chang. .subset2() is an internal version of [[ - it's faster, but doesn't do S3 dispatch which means that x needs to be a plain list).

With this function, solving your problem is easy:

x <- list(
  a = list(x = rnorm(10), y = letters[1:10], z = "OK"),
  b = list(x = rnorm(10), y = letters[11:20], z = "notOK")
)

# List of results
str(pluck(x, "z"))
#> List of 2
#>  $ a: chr "OK"
#>  $ b: chr "notOK"

# Vector of results
str(pluck(x, "z", character(1)))
#>  Named chr [1:2] "OK" "notOK"
#>  - attr(*, "names")= chr [1:2] "a" "b"

(You can also select by position: pluck(x, 2, character(10)))

Benchmarking

This method is also quite fast:

x_big <- rep(x, 1000)

myselect <- function(x,name){
  tmp <- unlist(x, recursive = FALSE)
  id <- grep(paste0("\.",name,"$"), names(tmp))
  tmp[id]
}

library(microbenchmark)
options(digits = 2)
microbenchmark(
  sapply(x_big, function(i)i$z),
  myselect(x_big,"z"),
  pluck(x_big, "z", character(1))
)
#> Unit: microseconds
#>                             expr  min   lq median   uq  max neval
#>   sapply(x_big, function(i) i$z) 2771 2886   2972 3124 5903   100
#>             myselect(x_big, "z") 2250 2330   2366 2401 3551   100
#>  pluck(x_big, "z", character(1))  717  786    825  889 1731   100

Kasper Van Lombeek · Answer

After a couple of hours looking for the cleanest method, we did:

 kadaster_building_temp$hnrlbl <- sapply(list,function(x){x$hnrlbl} )

Joris Meys · Answer

Warning : by using regular expressions, this solution might fail under some conditions (depending on the names you use in your lists). If speed is not an option, either list.map or the solution using sapply are more robust

You can gain quite some speed by using unlist() here and look for the names. Take the following function myselect:

myselect <- function(x,name){
  tmp <- unlist(x,recursive=FALSE)
  id <- grep(paste0("(^|\.)",name,"$"),names(tmp))
  tmp[id]
}

This one does about the same but in a vectorized way. By using the argument recursive=FALSE, you flatten the nested list to a flat list (all elements are part of the same list). Then you use the naming convention used by this function to look for all the elements that contain the exact name you want to select. Hence the call to paste0 to construct a regular expression that avoids partial name matches. Simple selection returns you again a list with the wanted elements. If you want this to be a vector or so, you can simply use unlist() on the result.

Note that I presume you have a list of lists, so you only want to flatten one level. For more complicated nesting, this obviously won't work in the current form.

Example and Benchmarking

The speed gain is dependent on the structure of the list obviously, but can go up to a 50fold or more speed gain.

Take following (very basic) example:

aList <- list(
  a=list(x=rnorm(10),y=letters[1:10],z="OK"),
  b=list(x=rnorm(10),y=letters[11:20],z="notOK")
  )

Benchmarking this gives:

require(rbenchmark)
benchmark(
  sapply(aList,function(i)i$z),
  myselect(aList,"z"),
  columns=c("test","elapsed","relative"),
  replications=10000
  )

                            test elapsed relative
2           myselect(aList, "z")    0.24    1.000
1 sapply(aList, function(i) i$z)    0.39    1.625

With larger objects, the improvement can be substantial. Using this on a list I happened to have in my workspace (dput is not an option here...):

> benchmark(
+   sapply(StatN0_1,function(i)i$SP),
+   myselect(StatN0_1,"SP"),
+   columns=c("test","elapsed","relative"),
+   replications=100
+ )
                                test elapsed relative
2           myselect(StatN0_1, "SP")    0.02      1.0
1 sapply(StatN0_1, function(i) i$SP)    1.13     56.5

How to avoid looping over list after reading from JSON with R

Tags:

json

r

Kasper Van Lombeek

4 Answers

jdharrison

Benchmarking

hadley

Kasper Van Lombeek

Joris Meys

Recent Activity

Donate For Us

How to avoid looping over list after reading from JSON with R

Tags:

json

r

Kasper Van Lombeek

4 Answers

jdharrison

Benchmarking

hadley

Kasper Van Lombeek

Joris Meys

Related questions

Recent Activity

Donate For Us