I have a vector of JSON data in R, and with lapply I extract the information:
list <- lapply(temp, fromJSON)
The structure of the first element of this list looks like this:
str(list[[1]])
List of 4
$ boundedBy :List of 2
..$ type : chr "Polygon"
..$ coordinates:List of 1
.. ..$ :List of 5
.. .. ..$ : num [1:2] 89328 208707
.. .. ..$ : num [1:2] 89333 208707
.. .. ..$ : num [1:2] 89333 208713
.. .. ..$ : num [1:2] 89328 208713
.. .. ..$ : num [1:2] 89328 208707
$ hnrlbl : NULL
$ opndatum : chr "2011-05-30"
$ oidn : chr "2954841"
This works for the first element: list[[1]]$hnrlbl
, but how do I do this at once for the whole list? Something like list[[.]]$hnrlbl
In this case you could just use list.map
from the rlist
package:
mylist <- lapply(temp, fromJSON)
library(rlist)
list.map(mylist, hnrlbl)
http://cran.r-project.org/web/packages/rlist/vignettes/Mapping.html
I have a helper function that's useful for these scenarios:
pluck <- function(x, name, type) {
if (missing(type)) {
lapply(x, .subset2, name)
} else {
vapply(x, .subset2, name, FUN.VALUE = type)
}
}
(This was inspired by underscore & Winston
Chang. .subset2()
is an internal version of [[
- it's faster, but
doesn't do S3 dispatch which means that x
needs to be a plain list).
With this function, solving your problem is easy:
x <- list(
a = list(x = rnorm(10), y = letters[1:10], z = "OK"),
b = list(x = rnorm(10), y = letters[11:20], z = "notOK")
)
# List of results
str(pluck(x, "z"))
#> List of 2
#> $ a: chr "OK"
#> $ b: chr "notOK"
# Vector of results
str(pluck(x, "z", character(1)))
#> Named chr [1:2] "OK" "notOK"
#> - attr(*, "names")= chr [1:2] "a" "b"
(You can also select by position: pluck(x, 2, character(10))
)
This method is also quite fast:
x_big <- rep(x, 1000)
myselect <- function(x,name){
tmp <- unlist(x, recursive = FALSE)
id <- grep(paste0("\\.",name,"$"), names(tmp))
tmp[id]
}
library(microbenchmark)
options(digits = 2)
microbenchmark(
sapply(x_big, function(i)i$z),
myselect(x_big,"z"),
pluck(x_big, "z", character(1))
)
#> Unit: microseconds
#> expr min lq median uq max neval
#> sapply(x_big, function(i) i$z) 2771 2886 2972 3124 5903 100
#> myselect(x_big, "z") 2250 2330 2366 2401 3551 100
#> pluck(x_big, "z", character(1)) 717 786 825 889 1731 100
After a couple of hours looking for the cleanest method, we did:
kadaster_building_temp$hnrlbl <- sapply(list,function(x){x$hnrlbl} )
Warning : by using regular expressions, this solution might fail under some conditions (depending on the names you use in your lists). If speed is not an option, either list.map
or the solution using sapply
are more robust
You can gain quite some speed by using unlist()
here and look for the names. Take the following function myselect
:
myselect <- function(x,name){
tmp <- unlist(x,recursive=FALSE)
id <- grep(paste0("(^|\\.)",name,"$"),names(tmp))
tmp[id]
}
This one does about the same but in a vectorized way. By using the argument recursive=FALSE
, you flatten the nested list to a flat list (all elements are part of the same list). Then you use the naming convention used by this function to look for all the elements that contain the exact name you want to select. Hence the call to paste0
to construct a regular expression that avoids partial name matches. Simple selection returns you again a list with the wanted elements. If you want this to be a vector or so, you can simply use unlist()
on the result.
Note that I presume you have a list of lists, so you only want to flatten one level. For more complicated nesting, this obviously won't work in the current form.
Example and Benchmarking
The speed gain is dependent on the structure of the list obviously, but can go up to a 50fold or more speed gain.
Take following (very basic) example:
aList <- list(
a=list(x=rnorm(10),y=letters[1:10],z="OK"),
b=list(x=rnorm(10),y=letters[11:20],z="notOK")
)
Benchmarking this gives:
require(rbenchmark)
benchmark(
sapply(aList,function(i)i$z),
myselect(aList,"z"),
columns=c("test","elapsed","relative"),
replications=10000
)
test elapsed relative
2 myselect(aList, "z") 0.24 1.000
1 sapply(aList, function(i) i$z) 0.39 1.625
With larger objects, the improvement can be substantial. Using this on a list I happened to have in my workspace (dput is not an option here...):
> benchmark(
+ sapply(StatN0_1,function(i)i$SP),
+ myselect(StatN0_1,"SP"),
+ columns=c("test","elapsed","relative"),
+ replications=100
+ )
test elapsed relative
2 myselect(StatN0_1, "SP") 0.02 1.0
1 sapply(StatN0_1, function(i) i$SP) 1.13 56.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With