Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Example in Advanced R of modifying a list

I can't seem to understand the following example in Advanced R.

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

y <- as.list(x)
cat(tracemem(y), "\n")
#> <0x7f80c5c3de20>

for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x7f80c5c3de20 -> 0x7f80c48de210]: 

I don't understand why a copy would be made in this case, since "If an object has a single name bound to it, R will modify it in place" and the object referenced by y indeed has only a single name y bound to it.

like image 449
lyh970817 Avatar asked May 16 '20 22:05

lyh970817


People also ask

What is a list explain the concept of lists in R with examples?

Lists are the R objects which contain elements of different types like − numbers, strings, vectors and another list inside it. A list can also contain a matrix or a function as its elements. List is created using list() function.

How do I add more data to a list in R?

To add an item to a list in R programming, call append() function and pass the list and item as arguments in the function call.

How do I create a variable list in R?

How to Create Lists in R? We can use the list() function to create a list. Another way to create a list is to use the c() function. The c() function coerces elements into the same type, so, if there is a list amongst the elements, then all elements are turned into components of a list.


Video Answer


1 Answers

While the commentary regarding RStudio references is probably true, it appears as though the book is outdated.

The last commit on the source code for that page was on 2019-06-25 - a date that predates the release of R v4.0.0.

If you check the change log for R, you will find the following change listed in v4.0.0:

Reference counting is now used instead of the NAMED mechanism for determining when objects can be safely mutated in base C code. This reduces the need for copying in some cases and should allow further optimizations in the future. It should help make the internal code easier to maintain.

R v3.6.3

Indeed, if you run the example code under R v3.6.3 (the version just prior to v4.0.0):

#> R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
#> Copyright (C) 2020 The R Foundation for Statistical Computing
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> 
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#> 
#>   Natural language support but running in an English locale
#> 
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#> 
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

for (i in seq_along(medians)) {
  x[[i]] <- x[[i]] - medians[[i]]
}

cat(tracemem(x), "\n")
#> <000000002457F7D0> 

for (i in 1:5) {
  x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x000000002457f7d0 -> 0x0000000024697c90]: 
#> tracemem[0x0000000024697c90 -> 0x0000000024697c20]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697c20 -> 0x0000000024697bb0]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697bb0 -> 0x0000000024697b40]: 
#> tracemem[0x0000000024697b40 -> 0x0000000024697ad0]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697ad0 -> 0x0000000024697a60]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697a60 -> 0x00000000246979f0]: 
#> tracemem[0x00000000246979f0 -> 0x0000000024697980]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697980 -> 0x0000000024697910]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697910 -> 0x00000000246978a0]: 
#> tracemem[0x00000000246978a0 -> 0x0000000024697830]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697830 -> 0x00000000246977c0]: [[<-.data.frame [[<- 
#> tracemem[0x00000000246977c0 -> 0x0000000024697750]: 
#> tracemem[0x0000000024697750 -> 0x00000000246976e0]: [[<-.data.frame [[<- 
#> tracemem[0x00000000246976e0 -> 0x0000000024697670]: [[<-.data.frame [[<- 

untracemem(x)

y <- as.list(x)
cat(tracemem(y), "\n")
#> <0000000024697600> 
 
for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x0000000024697600 -> 0x00000000247ec708]:

untracemem(y)

We observe the 15 copies being made for the dataframe and the one copy for the list as per the book.

R v4.0.0

However, if we run the same example code under R v4.0.0:

#> R version 4.0.0 (2020-04-24) -- "Arbor Day"
#> Copyright (C) 2020 The R Foundation for Statistical Computing
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> 
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#> 
#>   Natural language support but running in an English locale
#> 
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#> 
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

for (i in seq_along(medians)) {
  x[[i]] <- x[[i]] - medians[[i]]
}

cat(tracemem(x), "\n")
#> <00000000236B0C50> 

for (i in 1:5) {
  x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x00000000236b0c50 -> 0x00000000237a7a90]: 
#> tracemem[0x00000000237a7a90 -> 0x00000000237a7a20]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7a20 -> 0x00000000237a79b0]: 
#> tracemem[0x00000000237a79b0 -> 0x00000000237a7940]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7940 -> 0x00000000237a78d0]: 
#> tracemem[0x00000000237a78d0 -> 0x00000000237a7860]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7860 -> 0x00000000237a77f0]: 
#> tracemem[0x00000000237a77f0 -> 0x00000000237a7780]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7780 -> 0x00000000237a7710]: 
#> tracemem[0x00000000237a7710 -> 0x00000000237a76a0]: [[<-.data.frame [[<- 

untracemem(x)

y <- as.list(x)
cat(tracemem(y), "\n")
#> <00000000237A7630> 

for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}

untracemem(y)

We observe the effects of the change in reducing the number of copies performed. The copies for the dataframe have gone from 15 to 10 and there is no copy performed for the list anymore.

To answer OP's question directly, the copy was being made unnecessarily per the NAMED mechanism. However, the change to reference counting in R v4.0.0 prevents the unnecessary copy, and the object is now modified in place as expected.

like image 67
the-mad-statter Avatar answered Oct 08 '22 05:10

the-mad-statter