I have two dataframes, much like these:
data = data.frame(data=cbind(1:12,rep(c(1,2),6),rep(c(1,2,3),4)))
colnames(data)=c('v','h','c')
lookup = data.frame(data=cbind(c(rep(1,3),rep(2,3)),rep(c(1,2,3),2),21:26))
colnames(lookup)=c('h','c','t')
I want to subtract lookup$t from data$v where the h and c columns match.
I thought something like this would work
data$v-lookup$t[lookup$h==data$h&lookup$c==data$c]
but doesn't magically know that I want to implicitly iterate over the rows of data
I ended up doing this
myt = c()
for(i in 1:12) {
myt[i] = lookup$t[lookup$h==data$h[i]&lookup$c==data$c[i]]
}
which works fine, but I'm hoping someone can suggest a more sensible way that doesn't involve a loop.
The Pandas Built-In Function: iterrows() — 321 times faster.
apply() function to speed it up over 100x. This article takes Pandas' standard dataframe. apply function and upgrades it with a bit of Cython to speed up execution from 3 minutes to under 2 seconds.
While slower than apply , itertuples is quicker than iterrows , so if looping is required, try implementing itertuples instead. Using map as a vectorized solution gives even faster results.
Sounds like you could merge and then do the math:
dataLookedUp <- merge(data, lookup)
dataLookedUp$newValue <- with(dataLookedUp, v - t )
For your real data, is the merge and calc faster?
If data and/or lookup is really big you might use data.table to create an index before the merge in order to speed it up.
An alternative that is 1.) more familiar to those accustomed to SQL queries and 2.) often faster than the standard merge is to use the sqldf package. (Note that on Mac OS X, you'll probably want to install Tcl/Tk, on which sqldf depends.) As an added bonus, sqldf converts strings to factors automagically by default.
install.packages("sqldf")
library(sqldf)
data <- data.frame(v = 1:12, h = rep(c("one", "two"), 6), c = rep(c("one", "two", "three"), 4))
lookup <- data.frame(h = c(rep("one", 3), rep("two", 3)), c = rep(c("one", "two", "three"), 2), t = 21:26)
soln <- sqldf("select * from data inner join lookup using (h, c)")
soln <- transform(soln, v.minus.t = v - t)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With