Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I search for columns with same name, add the column values and replace these columns with same name by their sum? Using R

Tags:

r

I have a data frame where some consecutive columns have the same name. I need to search for these, add their values in for each row, drop one column and replace the other with their sum. without previously knowing which patterns are duplicated, possibly having to compare one column name with the following to see if there's a match.

Can someone help?

Thanks in advance.

like image 619
Assu Avatar asked May 09 '11 14:05

Assu


2 Answers

> dfrm <- data.frame(a = 1:10, b= 1:10, cc= 1:10, dd=1:10, ee=1:10)
> names(dfrm) <- c("a", "a", "b", "b", "b")
> sapply(unique(names(dfrm)[duplicated(names(dfrm))]), 
      function(x) Reduce("+", dfrm[ , grep(x, names(dfrm))]) )
       a  b
 [1,]  2  3
 [2,]  4  6
 [3,]  6  9
 [4,]  8 12
 [5,] 10 15
 [6,] 12 18
 [7,] 14 21
 [8,] 16 24
 [9,] 18 27
[10,] 20 30

EDIT 2: Using rowSums allows simplification of the first sapply argumentto just unique(names(dfrm)) at the expense of needing to remember to include drop=FALSE in "[":

sapply(unique(names(dfrm)), 
       function(x) rowSums( dfrm[ , grep(x, names(dfrm)), drop=FALSE]) )

To deal with NA's:

sapply(unique(names(dfrm)), 
      function(x) apply(dfrm[grep(x, names(dfrm))], 1, 
              function(y) if ( all(is.na(y)) ) {NA} else { sum(y, na.rm=TRUE) }
       )               )

(Edit note: addressed Tommy counter-example by putting unique around the names(.)[.] construction. The erroneous code was:

sapply(names(dfrm)[unique(duplicated(names(dfrm)))], 
     function(x) Reduce("+", dfrm[ , grep(x, names(dfrm))]) )
like image 83
IRTFM Avatar answered Sep 28 '22 20:09

IRTFM


Here is my one liner

# transpose data frame, sum by group = rowname, transpose back.
t(rowsum(t(dfrm), group = rownames(t(dfrm))))
like image 35
Ramnath Avatar answered Sep 28 '22 18:09

Ramnath