I have a data frame in R defined as follows:
data frame:
col 1 col 2 col 3 col4
200 AIG 8.5 12
800 AIG 8.1 20.1
500 A1B 20 50.5
800 A1B 12 30
120 A2M 1.6 8.5
dat <- structure(list(col1 = c(200, 800, 500, 800, 120), col2 = structure(c(3L,
3L, 1L, 1L, 2L), .Label = c("A1B", "A2M", "AIG"), class = "factor"),
col3 = c(8.5, 8.1, 20, 12, 1.6), col4 = c(12, 20.1, 50.5,
30, 8.5)), .Names = c("col1", "col2", "col3", "col4"), row.names = c(NA,
-5L), class = "data.frame")
Then I'd like to collapse the rows by id (in this case the unique ids are A1G, A1B, A2M).
Col 1, I'd like to collapse it by adding the rows with the same id.
Col 2, I'd like to collapse it to each unique id
Col 3, I'd like to collapse it as follows, take col1*col3, add them, and then divide them by the sum of col1.
I.e., the A1G new row value should be (8.5*20+8.1*80)/(80+20). Aka the weighted average of column 3 weighted by the values of col1.
Col 4, I'd like to take the maximum value.
The resulting data frame should look like:
column 1 column 2 column 3 column 4
800+200=1000 AIG (8.5*200+8.1*800)/1000=8.18 max(12,20.1)=20.1
800+500=1300 AIB (20*800+12*500)/1300=16.9 max(50.5, 30)=50.5
120 A2M 1.6 8.5
Any suggestions?
collapse: Collapse a character vectorCollapses a character vector of any length into a length 1 vector.
Summary This tutorial explains how to collapse data in R. Collapsing means using one or several grouping variables to find summary statistics — mean, median, etc. — for different categories in your data.
A solution in base:
dat2<-do.call(rbind,
by(dat,dat$col2, function(x)
with (x,
data.frame(
col1 = sum(col1),
col3 = sum(col1 * col3) / sum(col1),
col4 = max(col4)
)
)
)
)
dat2$col2<-rownames(dat2)
# col1 col3 col4 col2
# A1B 1300 15.07692 50.5 A1B
# A2M 120 1.60000 8.5 A2M
# AIG 1000 8.18000 20.1 AIG
Here is a data.table solution that will scale well for big data (speed and memory efficient)
library(data.table)
DT <- data.table(dat, key="col2")
DT[, list(col1=sum(col1),
col3=sum(col1 * col3) / sum(col1),
col4=max(col4)), by=col2]
# col2 col1 col3 col4
#1: A1B 1300 15.07692 50.5
#2: A2M 120 1.60000 8.5
#3: AIG 1000 8.18000 20.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With