Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to aggregate duplicate rows with multiple columns in data frame [duplicate]

I have a data.frame that looks like this (however with a larger number of columns and rows):

    Gene      Cell1    Cell2    Cell3     
1      A          2        7        8 
2      A          5        2        9 
3      B          2        7        8
4      C          1        4        3

I want to sum the rows that have the same value in Gene, in order to get something like this:

    Gene      Cell1    Cell2    Cell3     
1      A          7        9       17  
2      B          2        7        8
3      C          1        4        3

Based on the answers to previous questions, I've tried to use aggregate but I could not understand how I can get the above result. This is what I've tried:

aggregate(df[,-1], list(df[,1]), FUN = sum)

Does anyone have an idea of what I'm doing wrong?

like image 292
Euclides Avatar asked Jan 04 '23 23:01

Euclides


2 Answers

aggregate(df[,-1], list(Gene=df[,1]), FUN = sum)
#   Gene Cell1 Cell2 Cell3
# 1    A     7     9    17
# 2    B     2     7     8
# 3    C     1     4     3

will give you the output you are looking for.

like image 79
lukeA Avatar answered Jan 08 '23 05:01

lukeA


Or with dplyr:

library(dplyr)
df %>%
  group_by(Gene) %>%
  summarise_all(sum) %>%
  data.frame() -> newdf # so that newdf can further be used, if needed
like image 44
jay.sf Avatar answered Jan 08 '23 07:01

jay.sf