Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge columns of a dataframe by two conditions using aggregate

Tags:

r

aggregate

I have a matrix like this

  P   A   B  C 
  1   2   0  5
  2   1   1  3
  3   0   4  7
  1   1   1  0
  3   1   1  0
  3   0   2  1
  2   3   3  4

I want to merge/sort the rows by P and by each of the columns. So that each P value is for each column once and the value for each P in each column is summed up. The result should be:

 P  A  B  C
 1  3  0  0 
 1  0  1  0 
 1  0  0  5
 2  4  0  0
 2  0  4  0
 2  0  0  7
 3  1  0  0
 3  0  7  0
 3  0  0  8

I tried already aggregate but it only helps me to sum up every P value for all columns so that I have just one row for each P.

like image 735
Miguel123 Avatar asked Dec 19 '16 09:12

Miguel123


1 Answers

One idea is to split your data frame on P and apply a custom function(fun1) which creates a matrix with 0 and replaces the diagonal with the sum of the columns. i.e.

fun1 <- function(x){
m1 <- matrix(0, ncol = ncol(x) - 1, nrow = ncol(x) - 1)
diag(m1) <- sapply(x[-1], sum)
return(m1)
       }

l1 <- split(df, df$P)
do.call(rbind, lapply(l1, fun1))

#       [,1] [,2] [,3]
# [1,]    3    0    0
# [2,]    0    1    0
# [3,]    0    0    5
# [4,]    4    0    0
# [5,]    0    4    0
# [6,]    0    0    7
# [7,]    1    0    0
# [8,]    0    7    0
# [9,]    0    0    8

Or to get it to your desired output, then

final_df <- as.data.frame(cbind(rep(names(l1), each = ncol(df)-1), 
                                             do.call(rbind, lapply(l1, fun1))))
names(final_df) <- names(df)

final_df
#  P A B C
#1 1 3 0 0
#2 1 0 1 0
#3 1 0 0 5
#4 2 4 0 0
#5 2 0 4 0
#6 2 0 0 7
#7 3 1 0 0
#8 3 0 7 0
#9 3 0 0 8
like image 81
Sotos Avatar answered Oct 23 '22 17:10

Sotos