Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to divide between groups of rows using dplyr?

Tags:

dataframe

r

dplyr

I have this dataframe:

x <- data.frame(
    name = rep(letters[1:4], each = 2),
    condition = rep(c("A", "B"), times = 4),
    value = c(2,10,4,20,8,40,20,100)
) 
#   name condition value
# 1    a         A     2
# 2    a         B    10
# 3    b         A     4
# 4    b         B    20
# 5    c         A     8
# 6    c         B    40
# 7    d         A    20
# 8    d         B   100

I want to group by name and divide the value of rows with condition == "B" with those with condition == "A", to get this:

data.frame(
    name = letters[1:4],
    value = c(5,5,5,5)
)
#   name value
# 1    a     5
# 2    b     5
# 3    c     5
# 4    d     5

I know something like this can get me pretty close:

x$value[which(x$condition == "B")]/x$value[which(x$condition == "A")]

but I was wondering if there was an easy way to do this with dplyr (My dataframe is a toy example and I got to it by chaining multiple group_by and summarise calls).

like image 779
nachocab Avatar asked May 25 '16 21:05

nachocab


People also ask

How do I Group data in dplyr?

We’ll start by loading dplyr: The most important grouping verb is group_by (): it takes a data frame and one or more variables to group by: You can see the grouping when you print the data: Or use tally () to count the number of rows in each group.

What can you do with dplyr’s row-wise data frame?

In this vignette, you’ll learn dplyr’s approach centred around the row-wise data frame created by rowwise (). There are three common use cases that we discuss in this vignette: Row-wise aggregates (e.g. compute the mean of x, y, z). Calling a function multiple times with varying arguments. Working with list-columns.

How can I summarise statistics across multiple rows in dplyr?

Per row summary statistics dplyr::summarise () makes it really easy to summarise values across rows within one column. When combined with rowwise () it also makes it easy to summarise values across columns within one row. To see how, we’ll start by making a little dataset:

What is dplyr and why should I use it?

dplyr, and R in general, are particularly well suited to performing operations over columns, and performing operations over rows is much harder. In this vignette, you’ll learn dplyr’s approach centred around the row-wise data frame created by rowwise (). There are three common use cases that we discuss in this vignette:


2 Answers

Using data.table, convert the 'data.frame' to 'data.table' (setDT(x)), grouped by 'name', we divide the 'value' corresponds to 'B' condition by the those that corresponds to 'A' 'condition'.

library(data.table)
setDT(x)[,.(value = value[condition=="B"]/value[condition=="A"]) , name]
#    name value
#1:    a     5
#2:    b     5
#3:    c     5
#4:    d     5

Or reshape from 'long' to 'wide' and divide the 'B' column by 'A'.

dcast(setDT(x), name~condition, value.var='value')[, .(name, value = B/A)]
like image 119
akrun Avatar answered Sep 21 '22 06:09

akrun


Try:

x %>% 
  group_by(name) %>%
  summarise(value = value[condition == "B"] / value[condition == "A"])

Which gives:

#Source: local data frame [4 x 2]
#
#    name value
#  (fctr) (dbl)
#1      a     5
#2      b     5
#3      c     5
#4      d     5
like image 25
Steven Beaupré Avatar answered Sep 19 '22 06:09

Steven Beaupré