How to aggregate duplicate rows with multiple columns in data frame [duplicate]

Question

I have a data.frame that looks like this (however with a larger number of columns and rows):

    Gene      Cell1    Cell2    Cell3     
1      A          2        7        8 
2      A          5        2        9 
3      B          2        7        8
4      C          1        4        3

I want to sum the rows that have the same value in Gene, in order to get something like this:

    Gene      Cell1    Cell2    Cell3     
1      A          7        9       17  
2      B          2        7        8
3      C          1        4        3

Based on the answers to previous questions, I've tried to use aggregate but I could not understand how I can get the above result. This is what I've tried:

aggregate(df[,-1], list(df[,1]), FUN = sum)

Does anyone have an idea of what I'm doing wrong?

lukeA · Accepted Answer

aggregate(df[,-1], list(Gene=df[,1]), FUN = sum)
#   Gene Cell1 Cell2 Cell3
# 1    A     7     9    17
# 2    B     2     7     8
# 3    C     1     4     3

will give you the output you are looking for.

jay.sf · Answer

Or with dplyr:

library(dplyr)
df %>%
  group_by(Gene) %>%
  summarise_all(sum) %>%
  data.frame() -> newdf # so that newdf can further be used, if needed

How to aggregate duplicate rows with multiple columns in data frame [duplicate]

Tags:

dataframe

r

aggregate

Euclides

2 Answers

lukeA

jay.sf

Recent Activity

Donate For Us

How to aggregate duplicate rows with multiple columns in data frame [duplicate]

Tags:

dataframe

r

aggregate

Euclides

2 Answers

lukeA

jay.sf

Related questions

Recent Activity

Donate For Us