Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient linked list (ordered set) in R

Tags:

r

When reading records from a database cursor it is often unknown in advance how many rows are coming up. This makes it impossible to pre-allocate a list of the right size to store these objects.

What is an efficient way of storing all records in a list, when we the total size is unknown? The basic list type is slow because it will copy the entire list every time an element is appended:

x <- list()
for(i in 1:1e5){
  x[[i]] <- list("foo" = rnorm(3), bar = TRUE)
}

An environment is more efficient, but it is a map rather than an ordered set. So we need to convert the index into a string and then later sort the keys to retrieve the values, which seems suboptimal:

env <- new.env()
for(i in 1:1e5){
  env[[sprintf("%09d", i)]] <- list("foo" = rnorm(3), bar = TRUE)
}
x <- lapply(sort(ls(env)), get, env, inherits = FALSE)

A pairlist is supposed to be a linked list in R, however R seems to coerse it into a regular list whenever an element is appended from within R.

like image 224
Jeroen Ooms Avatar asked Apr 05 '15 19:04

Jeroen Ooms


1 Answers

This is slow:

 > x <- list()
 > for(i in 1:1e5){x[[i]]=list(foo=rnorm(3),bar=TRUE)}

I gave up waiting. But this is fast, almost instantaneous:

 > x <- list()
 > length(x)=1e5
 > for(i in 1:1e5){x[[i]]=list(foo=rnorm(3),bar=TRUE)}

So I reckon the way to go is to extend the list length by 10000 every time, and prune it back when you get to the last element and know the final output.

 > length(x)=2e5  # extend by another 1e5
 > for(i in 1:1e5){x[[i+1e5]]=list(foo=rnorm(3),bar=TRUE)}
 > length(x)=3e5  # and again... but this time only 100 more elts:
 > for(i in 1:100){x[[i+2e5]]=list(foo=rnorm(3),bar=TRUE)}
 > length(x) = 2e5 + 100

Another approach is to double the list size each time you need more elements.

like image 64
Spacedman Avatar answered Oct 17 '22 05:10

Spacedman