I have a data frame where some consecutive columns have the same name. I need to search for these, add their values in for each row, drop one column and replace the other with their sum. without previously knowing which patterns are duplicated, possibly having to compare one column name with the following to see if there's a match.
Can someone help?
Thanks in advance.
> dfrm <- data.frame(a = 1:10, b= 1:10, cc= 1:10, dd=1:10, ee=1:10)
> names(dfrm) <- c("a", "a", "b", "b", "b")
> sapply(unique(names(dfrm)[duplicated(names(dfrm))]),
function(x) Reduce("+", dfrm[ , grep(x, names(dfrm))]) )
a b
[1,] 2 3
[2,] 4 6
[3,] 6 9
[4,] 8 12
[5,] 10 15
[6,] 12 18
[7,] 14 21
[8,] 16 24
[9,] 18 27
[10,] 20 30
EDIT 2: Using rowSums allows simplification of the first sapply argumentto just unique(names(dfrm))
at the expense of needing to remember to include drop=FALSE in "[":
sapply(unique(names(dfrm)),
function(x) rowSums( dfrm[ , grep(x, names(dfrm)), drop=FALSE]) )
To deal with NA's:
sapply(unique(names(dfrm)),
function(x) apply(dfrm[grep(x, names(dfrm))], 1,
function(y) if ( all(is.na(y)) ) {NA} else { sum(y, na.rm=TRUE) }
) )
(Edit note: addressed Tommy counter-example by putting unique around the names(.)[.] construction. The erroneous code was:
sapply(names(dfrm)[unique(duplicated(names(dfrm)))],
function(x) Reduce("+", dfrm[ , grep(x, names(dfrm))]) )
Here is my one liner
# transpose data frame, sum by group = rowname, transpose back.
t(rowsum(t(dfrm), group = rownames(t(dfrm))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With