Challenge: recoding a data.frame() — make it faster

Question

Recoding is a common practice for survey data, but the most obvious routes take more time than they should.

The fastest code that accomplishes the same task with the provided sample data by system.time() on my machine wins.

## Sample data
dat <- cbind(rep(1:5,50000),rep(5:1,50000),rep(c(1,2,4,5,3),50000))
dat <- cbind(dat,dat,dat,dat,dat,dat,dat,dat,dat,dat,dat,dat)
dat <- as.data.frame(dat)
re.codes <- c("This","That","And","The","Other")

Code to optimize.

for(x in 1:ncol(dat)) { 
    dat[,x] <- factor(dat[,x], labels=re.codes)
    }

Current system.time():

   user  system elapsed 
   4.40    0.10    4.49

Hint: dat <- lapply(1:ncol(dat), function(x) dat[,x] <- factor(dat[,x],labels=rc))) is not any faster.

Joshua Ulrich · Accepted Answer

Combining @DWin's answer, and my answer from Most efficient list to data.frame method?:

system.time({
  dat3 <- list()
  # define attributes once outside of loop
  attrib <- list(class="factor", levels=re.codes)
  for (i in names(dat)) {              # loop over each column in 'dat'
    dat3[[i]] <- as.integer(dat[[i]])  # convert column to integer
    attributes(dat3[[i]]) <- attrib    # assign factor attributes
  }
  # convert 'dat3' into a data.frame. We can do it like this because:
  # 1) we know 'dat' and 'dat3' have the same number of rows and columns
  # 2) we want 'dat3' to have the same colnames as 'dat'
  # 3) we don't care if 'dat3' has different rownames than 'dat'
  attributes(dat3) <- list(row.names=c(NA_integer_,nrow(dat)),
    class="data.frame", names=names(dat))
})
identical(dat2, dat3)  # 'dat2' is from @Dwin's answer

Challenge: recoding a data.frame() — make it faster

Tags:

dataframe

r

Brandon Bertelsen

1 Answers

Joshua Ulrich

Recent Activity

Donate For Us

Challenge: recoding a data.frame() — make it faster

Tags:

dataframe

r

Brandon Bertelsen

1 Answers

Joshua Ulrich

Related questions

Recent Activity

Donate For Us