Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is preallocation useful for list?

Tags:

r

I understand it is useful to preallocate a vector or a matrix, because they are always stored in a contiguous memory block.

However, in terms of a list, it can contains elements of difference length and modes. So my first guess is that a list might merely contains pointers to the true address of its elements. Am I correct? A related question here What is the internal implementation of lists? It says the list is essentially an array, but it does't cover how the elements are stored in a list while size of each element may change.

Example 1: If a list contains a,b,c,d,e, and when myList$a<-1:1000000, is the list modified in place (which means only a is updated) or the whole list are copied and updated?

Example 2

> system.time( { myList <- list()
+                myList$a <- 1:10000000
+                myList$b <- 1:10000100
+                myList$c <- 1:10000200 
+                myList$d <- 1:10000300})
   user  system elapsed 
   0.01    0.02    0.03

> system.time({ myList2<-list(a=1:10000000,b=1:10000100,c=1:10000200,d=1:10000300) })
   user  system elapsed 
   0.00    0.03    0.03 

would myList be very inefficient compared to myList2 due to no preallocation? Or no noticeable performance difference at all no matter how large a,b,c,d are because the first one copies only pointers a few times more?

Here comes to preallocation. How does it look like for a list? Does it only preallocate the memory for the pointers? If that's the case, I don't see any use since there won't be much data copying for pointers anyway.

> system.time( { myList <- vector("list", 4)
+                myList[[1]] <- 1:10000000
+                myList[[2]] <- 1:10000100
+                myList[[3]]  <- 1:10000200 
+                myList[[4]] <- 1:10000300
+                names(myList) <- letters[1:4]})
   user  system elapsed 
   0.01    0.02    0.03 
like image 528
colinfang Avatar asked Aug 25 '13 00:08

colinfang


People also ask

What are benefits to Preallocation Matlab?

If you preallocate a 1-by-1,000,000 block of memory for x and initialize it to zero, then the code runs much faster because there is no need to repeatedly reallocate memory for the growing data structure.

Should I preallocate array Python?

You don't need to preallocate anything. Behind the scenes, the list type will periodically allocate more space than it needs for its immediate use to amortize the cost of resizing the underlying array across multiple updates.

What does it mean to preallocate an array?

Pre-allocating Space Thus, if you know (or have a good guess) at how big the array needs to be in the first place, you can "pre-allocate" it, meaning pre-size the array.

What is Preallocate memory?

"Pre-allocated memory" means that a program should allocate all the required memory blocks once after startup (using the new operator, as usual), rather than allocate memory multiple times during execution and leave memory which is no longer needed for the garbage collector to free.


1 Answers

This is too long for a comment -- but not a complete answer.

Named lists are treated differently to unnamed lists in terms of copying when modified.

Copying (for large objects)

This is a complex issue. See http://radfordneal.wordpress.com/2013/07/02/fixing-rs-named-problems-in-pqr/ for a reasonable explanation and note that some of these changes are being implemented in the development version of R. Also see http://r.789695.n4.nabble.com/Confused-about-NAMED-td4103326.html for more understanding of the underlying complexities

Here are some timings for various possibilities of preallocation

# preallocated contents so timing is list related only
.a <- seq_len(1e6); .b <- seq_len((1e6 + 1))
.c <- seq_len((1e6 + 2)); .d <- seq_len((1e6 + 3))



f1 <- function(){
# using `$<-` empty list
  x <- list()
  x$a <- .a
  x$b <- .b
  x$c <- .c 
  x$d <- .d
  x
}


f2 <- function(){
  # using `[[<-` on empty list
  x <- list()
  x[['a']] <- .a
  x[['b']] <- .b
  x[['c']] <- .c
  x[['d']] <- .d
  x
}


f3 <- function(){
  # using `[[<-` on empty list, naming later
  x <- list()
  x[[1]] <- .a
  x[[2]] <- .b
  x[[3]] <- .c
  x[[4]] <- .d
  names(x) <- letters[1:4]
  x
}

f4 <- function(){ 
  # just define the list
  x <- list(a = .a, b = .b,
            c = .c, d = .d)
}


f5 <- function(){ 
  # create a list of length 4, then fill and name
  x <- vector(mode = 'list', length = 4)
  x[[1]] <- .a
  x[[2]] <- .b
  x[[3]] <- .c
  x[[4]] <- .d
  names(x) <- letters[1:4]
  x
}

# f6 <- function(){
#  # this doesn't work!
#  # it creates a list of length 8
#  x <- vector(mode = 'list', length = 4)
#  x[['a']] <- .a
#  x[['b']] <- .b
#  x[['c']] <- .c
#  x[['d']] <- .d
#  x
# }


f7 <-function(){
# pre allocate list, name then fill
x <- vector(mode = 'list', length = 4)
  names(x) <- letters[1:4]
  x[[1]] <- .a
  x[[2]] <- .b
  x[[3]] <- .c
  x[[4]] <- .d
  x
}

f8 <- function(){
# preallocate correct length and then name
# and fill by name
  x <- vector(mode = 'list', length = 4)
  names(x) <- letters[1:4]
  x[['a']] <- .a
  x[['b']] <- .b
  x[['c']] <- .c
  x[['d']] <- .d
  x
}

library(microbenchmark)
microbenchmark(f1(),f2(),f3(),f4(),f5(),f7(),f8(),times=100)


microbenchmark(f1(),f2(),f3(),f4(),f5(),f7(),f8(),times=100)
# Unit: microseconds
#   expr      min       lq   median       uq       max neval
#   f1()    6.038   11.169   12.980   14.791    34.110   100
#   f2() 2528.881 4387.962 4707.014 6472.823 74586.266   100
#   f3() 2532.805 4537.376 4714.258 5353.722 74903.206   100
#   f4() 2475.756 4531.489 4721.503 6331.860 74460.395   100
#   f5() 2508.959 4512.474 4759.535 6673.551  7966.668   100
#   f7() 2545.181 4477.761 4709.127 6372.610  7964.856   100
#   f8() 2508.053 4467.799 4669.131 6181.993 74236.726   100
#
#  All results are identical.
all(identical(f1(),f2()),identical(f1(),f3()),
    identical(f1(),f4()),identical(f1(),f5()),
    identical(f1(),f7()),identical(f1(),f8()))

# [1] TRUE

All results are identical.

Clearly using $<-` on an empty list is the clear winner. This goes against my thinking of what should be quickest

like image 91
mnel Avatar answered Oct 31 '22 08:10

mnel