I want to aggregate a dataframe by two columns so that the variation of these exists only once. The value column should be aggregated by an aggregation function like max()
or sum()
Data:
itemID1 |itemID2 |value
---------|---------|-------
B0001 |B0001 |1
B0002 |B0001 |1
B0001 |B0002 |2
B0002 |B0002 |0
The result could be:
itemID1 |itemID2 |value
----------|----------|---------
B0001 |B0001 |1
B0001 |B0002 |3 #itemIDs could also be ordered in the other way
B0002 |B0002 |0
Up to now I have implemented it in SQL to use it via the library sqldf, but sqldf doesn't support WITH-clauses.
Is there a possibility to aggregate dataframes like that directly in R?
In base R
, but it duplicates the data since I work on a copy keeping the original intact.
dat2 <- dat
dat2[1:2] <- apply(dat2[1:2], 1, sort)
aggregate(value ~ itemID1 + itemID2, dat2, sum)
# itemID1 itemID2 value
#1 B0001 B0001 1
#2 B0001 B0002 3
#3 B0002 B0002 0
Now you can rm(dat2)
in order to tidy up.
DATA.
dat <-
structure(list(itemID1 = structure(c(1L, 2L, 1L, 2L), .Label = c("B0001",
"B0002"), class = "factor"), itemID2 = structure(c(1L, 1L, 2L,
2L), .Label = c("B0001", "B0002"), class = "factor"), value = c(1L,
1L, 2L, 0L)), .Names = c("itemID1", "itemID2", "value"), class = "data.frame", row.names = c(NA,
-4L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With