I have a data frame with three columns: ref, target, distance. Each ref has a measured distance to the same set of targets and I would like to get a vector of minimum distances for each ref. Right now I am doing this with a for loop, but it seems like there ought to be a way to vectorize this.
Here's my code:
refs <- levels(data$ref)
result <- c()
for (ref in refs) {
# Find the minimum distance for observations with the current ref
# but be sure to protect against ref == target!
best_dist <- min(data[data$ref == ref & data$target != ref,]$distance)
result <- c(result, best_dist)
}
Am I doomed by having my data frame set up this way or is there a good way to vectorize this? Thanks for the help!
Never grow an object within a loop using c
, cbind
, rbind
. The object will be copied every time.
Instead preallocate to the correct size (or some overestimate if the result is fluid).
That being said, an loop is not required here
I like data.table
s for memory efficiency and coding elegance.
library(data.table)
DT <- data.table(data)
DT[ref != target, list(bestdist = min(distance)), by = ref]
if ref and target are factor columns with different levels (as suggested in the comment), then either make them have identical levels, or convert to character
DT[as.character(ref) != as.character(target), list(bestdist = min(distance)), by = ref]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With