Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorize for loop over data frame in R

Tags:

r

coding-style

I have a data frame with three columns: ref, target, distance. Each ref has a measured distance to the same set of targets and I would like to get a vector of minimum distances for each ref. Right now I am doing this with a for loop, but it seems like there ought to be a way to vectorize this.

Here's my code:

refs <- levels(data$ref)

result <- c()
for (ref in refs) {
    # Find the minimum distance for observations with the current ref
    # but be sure to protect against ref == target!
    best_dist <- min(data[data$ref == ref & data$target != ref,]$distance)
    result <- c(result, best_dist)
}

Am I doomed by having my data frame set up this way or is there a good way to vectorize this? Thanks for the help!

like image 549
weitzner Avatar asked Feb 08 '13 00:02

weitzner


1 Answers

Never grow an object within a loop using c, cbind, rbind. The object will be copied every time. Instead preallocate to the correct size (or some overestimate if the result is fluid).

That being said, an loop is not required here

I like data.tables for memory efficiency and coding elegance.

 library(data.table)
 DT <- data.table(data)


 DT[ref != target, list(bestdist = min(distance)), by = ref] 

if ref and target are factor columns with different levels (as suggested in the comment), then either make them have identical levels, or convert to character

 DT[as.character(ref) != as.character(target),  list(bestdist = min(distance)), by = ref] 
like image 107
mnel Avatar answered Oct 17 '22 15:10

mnel