I have this dataframe:
x <- data.frame(
name = rep(letters[1:4], each = 2),
condition = rep(c("A", "B"), times = 4),
value = c(2,10,4,20,8,40,20,100)
)
# name condition value
# 1 a A 2
# 2 a B 10
# 3 b A 4
# 4 b B 20
# 5 c A 8
# 6 c B 40
# 7 d A 20
# 8 d B 100
I want to group by name and divide the value of rows with condition == "B"
with those with condition == "A"
, to get this:
data.frame(
name = letters[1:4],
value = c(5,5,5,5)
)
# name value
# 1 a 5
# 2 b 5
# 3 c 5
# 4 d 5
I know something like this can get me pretty close:
x$value[which(x$condition == "B")]/x$value[which(x$condition == "A")]
but I was wondering if there was an easy way to do this with dplyr (My dataframe is a toy example and I got to it by chaining multiple group_by
and summarise
calls).
We’ll start by loading dplyr: The most important grouping verb is group_by (): it takes a data frame and one or more variables to group by: You can see the grouping when you print the data: Or use tally () to count the number of rows in each group.
In this vignette, you’ll learn dplyr’s approach centred around the row-wise data frame created by rowwise (). There are three common use cases that we discuss in this vignette: Row-wise aggregates (e.g. compute the mean of x, y, z). Calling a function multiple times with varying arguments. Working with list-columns.
Per row summary statistics dplyr::summarise () makes it really easy to summarise values across rows within one column. When combined with rowwise () it also makes it easy to summarise values across columns within one row. To see how, we’ll start by making a little dataset:
dplyr, and R in general, are particularly well suited to performing operations over columns, and performing operations over rows is much harder. In this vignette, you’ll learn dplyr’s approach centred around the row-wise data frame created by rowwise (). There are three common use cases that we discuss in this vignette:
Using data.table
, convert the 'data.frame' to 'data.table' (setDT(x)
), grouped by 'name', we divide the 'value' corresponds to 'B' condition by the those that corresponds to 'A' 'condition'.
library(data.table)
setDT(x)[,.(value = value[condition=="B"]/value[condition=="A"]) , name]
# name value
#1: a 5
#2: b 5
#3: c 5
#4: d 5
Or reshape from 'long' to 'wide' and divide the 'B' column by 'A'.
dcast(setDT(x), name~condition, value.var='value')[, .(name, value = B/A)]
Try:
x %>%
group_by(name) %>%
summarise(value = value[condition == "B"] / value[condition == "A"])
Which gives:
#Source: local data frame [4 x 2]
#
# name value
# (fctr) (dbl)
#1 a 5
#2 b 5
#3 c 5
#4 d 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With