Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - How can I generate difference of all combinations of columns in a data frame

Example:

df <- data.frame(A=1:5, B=seq(0,10,2), C=seq(0,15,3))  
df  
A  B  C  
1  2  3  
2  4  6  
3  6  9  
4  8 12  
5 10 15  

What I want is:

A B C (A-B) (A-C) (B-C)  
1 2 3 -1 -2 -1  
2 4 6 -2 -4 -2  
3 6 9 -3 -6 -3  
4 8 12 -4 -8 -4  
5 10 15 -5 -10 -5  

This is a sample. In my problem I have over 100 columns
Any suggestions on how to do this in R?

like image 243
user124543131234523 Avatar asked Dec 18 '22 10:12

user124543131234523


2 Answers

We can use the FUN argument in combn

combn(seq_along(df), 2, FUN = function(x) df[,x[1]]- df[,x[2]])
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]   -1   -2    0   -2   -1    1   -1    2    0    -2
#[2,]   -2   -4   -4   -7   -2   -2   -5    0   -3    -3
#[3,]   -3   -6   -8  -12   -3   -5   -9   -2   -6    -4
#[4,]   -4   -8  -12  -17   -4   -8  -13   -4   -9    -5
#[5,]   -5  -10  -16  -22   -5  -11  -17   -6  -12    -6

Also, combn takes the data.frame as argument, so simply

combn(df, 2, FUN = function(x) x[,1]-x[,2])
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]   -1   -2    0   -2   -1    1   -1    2    0    -2
#[2,]   -2   -4   -4   -7   -2   -2   -5    0   -3    -3
#[3,]   -3   -6   -8  -12   -3   -5   -9   -2   -6    -4
#[4,]   -4   -8  -12  -17   -4   -8  -13   -4   -9    -5
#[5,]   -5  -10  -16  -22   -5  -11  -17   -6  -12    -6

data

df <- data.frame(A=1:5, B=seq(2,10,2), C=seq(3,15,3), d=seq(1,25,5), e=seq(3,31,6))
like image 141
akrun Avatar answered Jan 14 '23 04:01

akrun


Here you go

df <- data.frame(A=1:5, B=seq(2,10,2), C=seq(3,15,3), d=seq(1,25,5), e=seq(3,31,6))

> df
  A  B  C  d  e
1 1  2  3  1  3
2 2  4  6  6  9
3 3  6  9 11 15
4 4  8 12 16 21
5 5 10 15 21 27

z = combn(1:ncol(df),2)

> z
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    2    2    2    3    3     4
[2,]    2    3    4    5    3    4    5    4    5     5

y = apply(z,2,function(x){
  df[,x[1]]-df[,x[2]]
})

Result:

> y
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]   -1   -2    0   -2   -1    1   -1    2    0    -2
[2,]   -2   -4   -4   -7   -2   -2   -5    0   -3    -3
[3,]   -3   -6   -8  -12   -3   -5   -9   -2   -6    -4
[4,]   -4   -8  -12  -17   -4   -8  -13   -4   -9    -5
[5,]   -5  -10  -16  -22   -5  -11  -17   -6  -12    -6

The matrix z tells you which pair of columns were substracted

You do realize that if df has 100 columns, all the combinations add up to 4950.

like image 27
R. Schifini Avatar answered Jan 14 '23 04:01

R. Schifini