Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using := in data.table with paste()

Tags:

r

data.table

I have started using data.table for a large population model. So far, I have been impressed because using the data.table structure decreases my simulation run times by about 30%. I am trying to further optimize my code and have included a simplified example. My two questions are:

  1. Is is possible to use the := operator with this code?
  2. Would using the := operator be quicker (although, if I am able to answer my first question, I should be able to answer my question 2!)?

I am using R version 3.1.2 on a machine running Windows 7 with data.table version 1.9.4.

Here is my reproducible example:

library(data.table)

## Create  example table and set initial conditions
nYears = 10
exampleTable = data.table(Site = paste("Site", 1:3))
exampleTable[ , growthRate := c(1.1, 1.2, 1.3), ]
exampleTable[ , c(paste("popYears", 0:nYears, sep = "")) := 0, ]

exampleTable[ , "popYears0" := c(10, 12, 13)] # set the initial population size

for(yearIndex in 0:(nYears - 1)){
    exampleTable[[paste("popYears", yearIndex + 1, sep = "")]] <- 
    exampleTable[[paste("popYears", yearIndex, sep = "")]] * 
    exampleTable[, growthRate]
}

I am trying to do something like:

for(yearIndex in 0:(nYears - 1)){
    exampleTable[ , paste("popYears", yearIndex + 1, sep = "") := 
    paste("popYears", yearIndex, sep = "") * growthRate, ] 
}

However, this does not work because the paste does not work with the data.table, for example:

exampleTable[ , paste("popYears", yearIndex + 1, sep = "")]
# [1] "popYears10"

I have looked through the data.table documentation. Section 2.9 of the FAQ uses cat, but this produces a null output.

exampleTable[ , cat(paste("popYears", yearIndex + 1, sep = ""))]
# [1] popYears10NULL

Also, I tried searching Google and rseek.org, but didn't find anything. If am missing an obvious search term, I would appreciate a search tip. I have always found searching for R operators to be hard because search engines don't like symbols (e.g., ":=") and "R" can be vague.

like image 971
Richard Erickson Avatar asked Jan 05 '15 18:01

Richard Erickson


1 Answers

## Start with 1st three columns of example data
dt <- exampleTable[,1:3]

## Run for 1st five years
nYears <- 5
for(ii in seq_len(nYears)-1) {
    y0 <- as.symbol(paste0("popYears", ii))
    y1 <- paste0("popYears", ii+1)
    dt[, (y1) := eval(y0)*growthRate]
}

## Check that it worked
dt
#     Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1        1.1        10      11.0     12.10    13.310   14.6410  16.10510
#2: Site 2        1.2        12      14.4     17.28    20.736   24.8832  29.85984
#3: Site 3        1.3        13      16.9     21.97    28.561   37.1293  48.26809

Edit:

Because the possibility of speeding this up using set() keeps coming up in the comments, I'll throw this additional option out there.

nYears <- 5

## Things that only need to be calculated once can be taken out of the loop
r <- dt[["growthRate"]]
yy <- paste0("popYears", seq_len(nYears+1)-1)

## A loop using set() and data.table's nice compact syntax
for(ii in seq_len(nYears)) {
    set(dt, , yy[ii+1], r*dt[[yy[ii]]])
}

## Check results
dt
#     Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1        1.1        10      11.0     12.10    13.310   14.6410  16.10510
#2: Site 2        1.2        12      14.4     17.28    20.736   24.8832  29.85984
#3: Site 3        1.3        13      16.9     21.97    28.561   37.1293  48.26809
like image 185
Josh O'Brien Avatar answered Sep 19 '22 07:09

Josh O'Brien