Calculating ratios by group with dplyr

Tags:

dplyr

Using the following dataframe I would like to group the data by replicate and group and then calculate a ratio of treatment values to control values.

structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L), .Label = c("case", "controls"), class = "factor"), treatment = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "EPA", class = "factor"), 
    replicate = structure(c(2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L), .Label = c("four", 
    "one", "three", "two"), class = "factor"), fatty_acid_family = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "saturated", class = "factor"), 
    fatty_acid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "14:0", class = "factor"), 
    quant = c(6.16, 6.415, 4.02, 4.05, 4.62, 4.435, 3.755, 3.755
    )), .Names = c("group", "treatment", "replicate", "fatty_acid_family", 
"fatty_acid", "quant"), class = "data.frame", row.names = c(NA, 
-8L))

I have tried using dplyr as follows:

group_by(dataIn, replicate, group) %>% transmute(ratio = quant[group=="case"]/quant[group=="controls"])

but this results in Error: incompatible size (%d), expecting %d (the group size) or 1

Initially I thought this might be because I was trying to create 4 ratios from a df 8 rows deep and so I thought summarise might be the answer (collapsing each group to one ratio) but that doesn't work either (my understanding is a shortcoming).

group_by(dataIn, replicate, group) %>% summarise(ratio = quant[group=="case"]/quant[group=="controls"])

  replicate    group ratio
1      four     case    NA
2      four controls    NA
3       one     case    NA
4       one controls    NA
5     three     case    NA
6     three controls    NA
7       two     case    NA
8       two controls    NA

I would appreciate some advice on where I'm going wrong or even if this can be done with dplyr.

Thanks.

361

asked Feb 12 '15 20:02

duff

1 Answers

You can try:

group_by(dataIn, replicate) %>% 
    summarise(ratio = quant[group=="case"]/quant[group=="controls"])
#Source: local data frame [4 x 2]
#
#  replicate    ratio
#1      four 1.078562
#2       one 1.333333
#3     three 1.070573
#4       two 1.446449

Because you grouped by replicate and group, you could not access data from different groups at the same time.

195

answered Nov 15 '22 08:11

talat

Related questions
                            
                                Predicting Probabilities for GBM with caret library
                            
                                Multiple plots in a for loop with Sweave
                            
                                R grep and exact matches
                            
                                mailR: how to send rmarkdown documents as body in email?
                            
                                R markdown presentation not displaying plots
                            
                                Row Referencing in R data.table package
                            
                                R: loop through list
                            
                                Deliver a message after returning the function result
                            
                                dplyr equivalent to ddply in plyr diamonds example
                            
                                options to allow heavily-weighted points on a map to overwhelm other points with low weights
                            
                                "Unitless", qualitative, or relative axis scales ggplot2
                            
                                Error only when running whole block of code
                            
                                Calculating mean date by row
                            
                                Add image (png file) to header of pdf file created with R
                            
                                R remove repeated digit sequences
                            
                                Using substring on a column in R
                            
                                stat_sum and stat_identity give weird results
                            
                                Is it possible to use column indices in merge?
                            
                                Getting multiple outputs from reactive
                            
                                Missing last sequence in seq() in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With