Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregating regardless of the order of columns

Tags:

dataframe

r

I want to aggregate a dataframe by two columns so that the variation of these exists only once. The value column should be aggregated by an aggregation function like max() or sum()

Data:

itemID1  |itemID2  |value
---------|---------|-------
B0001    |B0001    |1
B0002    |B0001    |1
B0001    |B0002    |2
B0002    |B0002    |0

The result could be:

itemID1   |itemID2   |value
----------|----------|---------
B0001     |B0001     |1
B0001     |B0002     |3          #itemIDs could also be ordered in the other way
B0002     |B0002     |0

Up to now I have implemented it in SQL to use it via the library sqldf, but sqldf doesn't support WITH-clauses.

Is there a possibility to aggregate dataframes like that directly in R?

like image 442
vir.dz Avatar asked Dec 01 '22 15:12

vir.dz


1 Answers

In base R, but it duplicates the data since I work on a copy keeping the original intact.

dat2 <- dat
dat2[1:2] <- apply(dat2[1:2], 1, sort)
aggregate(value ~ itemID1 + itemID2, dat2, sum)
#  itemID1 itemID2 value
#1   B0001   B0001     1
#2   B0001   B0002     3
#3   B0002   B0002     0

Now you can rm(dat2) in order to tidy up.

DATA.

dat <-
structure(list(itemID1 = structure(c(1L, 2L, 1L, 2L), .Label = c("B0001", 
"B0002"), class = "factor"), itemID2 = structure(c(1L, 1L, 2L, 
2L), .Label = c("B0001", "B0002"), class = "factor"), value = c(1L, 
1L, 2L, 0L)), .Names = c("itemID1", "itemID2", "value"), class = "data.frame", row.names = c(NA, 
-4L))
like image 99
Rui Barradas Avatar answered Dec 04 '22 03:12

Rui Barradas