Collapse rows in a data frame using R

Tags:

r

I have a data frame in R defined as follows:

data frame:

col 1 col 2 col 3 col4
200    AIG   8.5   12
800    AIG   8.1   20.1
500    A1B   20   50.5
800    A1B   12   30
120    A2M   1.6   8.5

dat <- structure(list(col1 = c(200, 800, 500, 800, 120), col2 = structure(c(3L, 
    3L, 1L, 1L, 2L), .Label = c("A1B", "A2M", "AIG"), class = "factor"), 
        col3 = c(8.5, 8.1, 20, 12, 1.6), col4 = c(12, 20.1, 50.5, 
        30, 8.5)), .Names = c("col1", "col2", "col3", "col4"), row.names = c(NA, 
    -5L), class = "data.frame")

Then I'd like to collapse the rows by id (in this case the unique ids are A1G, A1B, A2M).
Col 1, I'd like to collapse it by adding the rows with the same id.
Col 2, I'd like to collapse it to each unique id
Col 3, I'd like to collapse it as follows, take col1*col3, add them, and then divide them by the sum of col1.
I.e., the A1G new row value should be (8.5*20+8.1*80)/(80+20). Aka the weighted average of column 3 weighted by the values of col1.
Col 4, I'd like to take the maximum value.

The resulting data frame should look like:

column 1 column 2 column 3 column 4
800+200=1000    AIG   (8.5*200+8.1*800)/1000=8.18   max(12,20.1)=20.1
800+500=1300    AIB   (20*800+12*500)/1300=16.9   max(50.5, 30)=50.5
120    A2M   1.6   8.5

Any suggestions?

964

asked Oct 14 '12 17:10

Dnaiel

2 Answers

A solution in base:

dat2<-do.call(rbind,
  by(dat,dat$col2, function(x) 
    with (x,
     data.frame(
       col1 = sum(col1),
       col3 = sum(col1 * col3) / sum(col1),
       col4 = max(col4)
     )
    )
  )
)
dat2$col2<-rownames(dat2)

#     col1     col3 col4 col2
# A1B 1300 15.07692 50.5  A1B
# A2M  120  1.60000  8.5  A2M
# AIG 1000  8.18000 20.1  AIG

143

answered Sep 18 '22 00:09

nograpes

Here is a data.table solution that will scale well for big data (speed and memory efficient)

library(data.table)
DT <- data.table(dat, key="col2")
DT[, list(col1=sum(col1), 
          col3=sum(col1 * col3) / sum(col1), 
          col4=max(col4)), by=col2]
#   col2 col1     col3 col4
#1:  A1B 1300 15.07692 50.5
#2:  A2M  120  1.60000  8.5
#3:  AIG 1000  8.18000 20.1

answered Sep 20 '22 00:09

GSee

Related questions
                            
                                Arrange within a group with dplyr
                            
                                Set units of difference between datetime objects
                            
                                Which is the best method to apply a script repetitively to n .csv files in R?
                            
                                How do I replace the string exactly using gsub()
                            
                                How can I plot to multiple devices at the same time?
                            
                                can I write an xts object using write.csv in R
                            
                                Sorting months in R
                            
                                R: Grid layout title
                            
                                How can I calculate the means of rows while excluding the zero values from rows in data frame
                            
                                Finding the index of an NA value in a vector [duplicate]
                            
                                Cumulative sum that resets when 0 is encountered
                            
                                Performing a dplyr full_join without a common variable to blend data frames
                            
                                Pivoting rows into columns
                            
                                How can a line be overlaid on a bar plot using ggplot2?
                            
                                Install RPostgreSQL on RHEL 6.5 libpq-fe.h Error
                            
                                Can't change params in Rmd documents
                            
                                How do I flip rows and columns in R
                            
                                Label lines in a plot
                            
                                UTF-8 file output in R
                            
                                Running R scripts from VBA

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With