Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pulling data from a different data.frame with dplyr?

I have a data frame called dat_new, essentially it is clinic visit data, hrn being a patient ID, and dov being date of visit (multiple visits per person). Then I have a data frame called event with dated hospital admissions (multiple admissions per person).

What I want to do, is for each clinic visit, I want to sum the hospital admissions that occurred prior to that clinic visit, simple.

This works with ddply from plyr, takes a bit of time but works well.

temp <- ddply(dat_new, .(hrn,dov), summarise,
              dka2 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==2),
              dka3 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==3),
              dka8 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==8)
)

Now, trying to rewrite in dplyr, I get an error

Error: binding not found: 'event_code'

I have it coded like this:

temp2 <- group_by(dat_new, hrn, dov)
temp3 <- summarise(temp2,
                   dka2 = sum(event$event_code[which(event$hrn==hrn & event$doa <= dov)]==2))

Obviously event_code is not in the temp2 data frame. Is it a case of dplyr can not work with 'other' data frames when 'summarising'? If there is a far better way to be doing the 'lookup'/sum I'm doing I'm all ears.

I did try this a few times trialing loading packages on a vanilla R in different orders to try and eliminate any namespace issues.

Thanks

EDIT - REPRODUCIBLE EXAMPLE

This is a quick and dirty example just to illustrate the issue. If we make a 'lookup' data.frame that has 2 of each car, with a mpg around 500, we can then try and go through the original data.frame, looking up in the new data.frame and summing the two mpgs together. plyr gives the expected, figures around 1000. dplyr errors.

# add the model names as a column so they're easier to get at
mtcars$models <- row.names(mtcars)

# create a 'lookup' table
xtra <- data.frame(models = rep(row.names(mtcars),2),
                    newmpg = rnorm(2*nrow(mtcars),500,10)
)
xtra <- xtra[sample(row.names(xtra)), ]

library(plyr)
ddply(mtcars, .(models), summarise,
        revisedmpg = sum(xtra$newmpg[models==xtra$models]) )
# great, one row per car, with both mpgs added together
library(dplyr)

temp2 <- group_by(mtcars, models)
temp3 <- summarise(temp2,
                   revisedmpg = xtra$newmpg[models==xtra$models] )
# error
like image 394
nzcoops Avatar asked Nov 01 '22 06:11

nzcoops


1 Answers

How about:

merge(mtcars,xtra,by="models") %.% group_by(models) %.% summarise(sum(newmpg)) 

EDIT sorry I think this is what you want;

# from what I can tell of your data:
dat_new<-data.frame(hrn=c("P1","P2"),dov=42000)
event<-data.frame(hrn=sample(dat_new$hrn,20,T),doa=41990+sample(1:20,20),event_code=sample(2:8,20,T))


merge(dat_new,event,by="hrn") %.%
filter(doa<=dov) %.% 
group_by(hrn,dov) %.%
summarise(dka2=length(event_code[event_code==2]),
          dka3=length(event_code[event_code==3]),
          dka8=length(event_code[event_code==8]))

Source: local data frame [2 x 5]
Groups: hrn

  hrn   dov dka2 dka3 dka8
1  P1 42000    2    1    0
2  P2 42000    1    0    1

And apologies - I'd mixed up doa & dov before the edit - you may need to tweak the merge(,by=c("x",..)) call depending on what else is in your tables

like image 182
Troy Avatar answered Nov 08 '22 05:11

Troy