I have some dataset like this:
# date # value class
1984-04-01 95.32384 A
1984-04-01 39.86818 B
1984-07-01 43.57983 A
1984-07-01 10.83754 B
Now I would like to group the data by data and subtract the value of class B from class A. I looked into ddply, summarize, melt and aggregate but cannot quite get what I want. Is there a way to do it easily? Note that I have exactly two values per date one of class A and one of class B. I mean i could re-arrange it into two dfs order it by date and class and merge it again, but I feel there is a more R way to do it.
The group_by() function in R is from dplyr package that is used to group rows by column values in the DataFrame, It is similar to GROUP BY clause in SQL. R dplyr groupby is used to collect identical data into groups on DataFrame and perform aggregate functions on the grouped data.
Group By Count in R using dplyr You can use group_by() function along with the summarise() from dplyr package to find the group by count in R DataFrame, group_by() returns the grouped_df ( A grouped Data Frame) and use summarise() on grouped df to get the group by count.
How to do group by sum in R? By using aggregate() from R base or group_by() function along with the summarise() from the dplyr package you can do the group by on dataframe on a specific column and get the sum of a column for each group.
Assuming this data frame (generated as in Prasad's post but with a set.seed
for reproducibility):
set.seed(123)
DF <- data.frame( date = rep(seq(as.Date('1984-04-01'),
as.Date('1984-04-01') + 3, by=1),
1, each=2),
class = rep(c('A','B'), 4),
value = sample(1:8))
then we consider seven solutions:
1) zoo can give us a one line solution (not counting the library
statement):
library(zoo)
z <- with(read.zoo(DF, split = 2), A - B)
giving this zoo
series:
> z
1984-04-01 1984-04-02 1984-04-03 1984-04-04
-3 3 3 -5
Also note that as.data.frame(z)
or data.frame(time = time(z), value = coredata(z))
gives a data frame; however, you may wish to leave it as a zoo object since it is a time series and other operations are more conveniently done on it in this form, e.g. plot(z)
2) sqldf can also give a one statement solution (aside from the library
invocation):
> library(sqldf)
> sqldf("select date, sum(((class = 'A') - (class = 'B')) * value) as value
+ from DF group by date")
date value
1 1984-04-01 -3
2 1984-04-02 3
3 1984-04-03 3
4 1984-04-04 -5
3) tapply can be used as the basis of a solution inspired by the sqldf solution:
> with(DF, tapply(((class =="A") - (class == "B")) * value, date, sum))
1984-04-01 1984-04-02 1984-04-03 1984-04-04
-3 3 3 -5
4) aggregate can be used in the same way as sqldf
and tapply
above (although a slightly different solution also based on aggregate
has already appeared):
> aggregate(((DF$class=="A") - (DF$class=="B")) * DF["value"], DF["date"], sum)
date value
1 1984-04-01 -3
2 1984-04-02 3
3 1984-04-03 3
4 1984-04-04 -5
5) summaryBy from the doBy package can provide yet another solution although it does need a transform
to help it along:
> library(doBy)
> summaryBy(value ~ date, transform(DF, value = ((class == "A") - (class == "B")) * value), FUN = sum, keep.names = TRUE)
date value
1 1984-04-01 -3
2 1984-04-02 3
3 1984-04-03 3
4 1984-04-04 -5
6) remix from the remix package can do it too but with a transform
and features particularly pretty output:
> library(remix)
> remix(value ~ date, transform(DF, value = ((class == "A") - (class == "B")) * value), sum)
value ~ date
============
+------+------------+-------+-----+
| | sum |
+======+============+=======+=====+
| date | 1984-04-01 | value | -3 |
+ +------------+-------+-----+
| | 1984-04-02 | value | 3 |
+ +------------+-------+-----+
| | 1984-04-03 | value | 3 |
+ +------------+-------+-----+
| | 1984-04-04 | value | -5 |
+------+------------+-------+-----+
7) summary.formula in the Hmisc package also has pretty output:
> library(Hmisc)
> summary(value ~ date, data = transform(DF, value = ((class == "A") - (class == "B")) * value), fun = sum, overall = FALSE)
value N=8
+----+----------+-+-----+
| | |N|value|
+----+----------+-+-----+
|date|1984-04-01|2|-3 |
| |1984-04-02|2| 3 |
| |1984-04-03|2| 3 |
| |1984-04-04|2|-5 |
+----+----------+-+-----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With