I have a data frame called dat_new, essentially it is clinic visit data, hrn being a patient ID, and dov being date of visit (multiple visits per person). Then I have a data frame called event with dated hospital admissions (multiple admissions per person).
What I want to do, is for each clinic visit, I want to sum the hospital admissions that occurred prior to that clinic visit, simple.
This works with ddply from plyr, takes a bit of time but works well.
temp <- ddply(dat_new, .(hrn,dov), summarise,
dka2 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==2),
dka3 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==3),
dka8 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==8)
)
Now, trying to rewrite in dplyr, I get an error
Error: binding not found: 'event_code'
I have it coded like this:
temp2 <- group_by(dat_new, hrn, dov)
temp3 <- summarise(temp2,
dka2 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==2))
Obviously event_code is not in the temp2 data frame. Is it a case of dplyr can not work with 'other' data frames when 'summarising'? If there is a far better way to be doing the 'lookup'/sum I'm doing I'm all ears.
I did try this a few times trialing loading packages on a vanilla R in different orders to try and eliminate any namespace issues.
Thanks
EDIT - REPRODUCIBLE EXAMPLE
This is a quick and dirty example just to illustrate the issue. If we make a 'lookup' data.frame that has 2 of each car, with a mpg around 500, we can then try and go through the original data.frame, looking up in the new data.frame and summing the two mpgs together. plyr gives the expected, figures around 1000. dplyr errors.
# add the model names as a column so they're easier to get at
mtcars$models <- row.names(mtcars)
# create a 'lookup' table
xtra <- data.frame(models = rep(row.names(mtcars),2),
newmpg = rnorm(2*nrow(mtcars),500,10)
)
xtra <- xtra[sample(row.names(xtra)), ]
library(plyr)
ddply(mtcars, .(models), summarise,
revisedmpg = sum(xtra$newmpg[models==xtra$models]) )
# great, one row per car, with both mpgs added together
library(dplyr)
temp2 <- group_by(mtcars, models)
temp3 <- summarise(temp2,
revisedmpg = xtra$newmpg[models==xtra$models] )
# error
How about:
merge(mtcars,xtra,by="models") %.% group_by(models) %.% summarise(sum(newmpg))
EDIT sorry I think this is what you want;
# from what I can tell of your data:
dat_new<-data.frame(hrn=c("P1","P2"),dov=42000)
event<-data.frame(hrn=sample(dat_new$hrn,20,T),doa=41990+sample(1:20,20),event_code=sample(2:8,20,T))
merge(dat_new,event,by="hrn") %.%
filter(doa<=dov) %.%
group_by(hrn,dov) %.%
summarise(dka2=length(event_code[event_code==2]),
dka3=length(event_code[event_code==3]),
dka8=length(event_code[event_code==8]))
Source: local data frame [2 x 5]
Groups: hrn
hrn dov dka2 dka3 dka8
1 P1 42000 2 1 0
2 P2 42000 1 0 1
And apologies - I'd mixed up doa & dov before the edit - you may need to tweak the merge(,by=c("x",..))
call depending on what else is in your tables
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With