When reading records from a database cursor it is often unknown in advance how many rows are coming up. This makes it impossible to pre-allocate a list of the right size to store these objects.
What is an efficient way of storing all records in a list, when we the total size is unknown? The basic list type is slow because it will copy the entire list every time an element is appended:
x <- list()
for(i in 1:1e5){
x[[i]] <- list("foo" = rnorm(3), bar = TRUE)
}
An environment is more efficient, but it is a map rather than an ordered set. So we need to convert the index into a string and then later sort the keys to retrieve the values, which seems suboptimal:
env <- new.env()
for(i in 1:1e5){
env[[sprintf("%09d", i)]] <- list("foo" = rnorm(3), bar = TRUE)
}
x <- lapply(sort(ls(env)), get, env, inherits = FALSE)
A pairlist
is supposed to be a linked list in R, however R seems to coerse it into a regular list whenever an element is appended from within R.
This is slow:
> x <- list()
> for(i in 1:1e5){x[[i]]=list(foo=rnorm(3),bar=TRUE)}
I gave up waiting. But this is fast, almost instantaneous:
> x <- list()
> length(x)=1e5
> for(i in 1:1e5){x[[i]]=list(foo=rnorm(3),bar=TRUE)}
So I reckon the way to go is to extend the list length by 10000 every time, and prune it back when you get to the last element and know the final output.
> length(x)=2e5 # extend by another 1e5
> for(i in 1:1e5){x[[i+1e5]]=list(foo=rnorm(3),bar=TRUE)}
> length(x)=3e5 # and again... but this time only 100 more elts:
> for(i in 1:100){x[[i+2e5]]=list(foo=rnorm(3),bar=TRUE)}
> length(x) = 2e5 + 100
Another approach is to double the list size each time you need more elements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With