Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given an R dataframe with column A, how do I create two new columns containing all ordered combinations of A

I have a data.frame with one id column (x below), and a number of variables (y1,y2 below).

    x y1 y2
1   1 43 55
2   2 51 53
[...]

What I would like to generate from this is a dataframe where the first two columns cover every ordered combination of x (except where they are equal) along with columns for each variable related to the order. The data frame header and first two rows would look like this (did this by hand, excuse errors):

xi xj y1i y1j y2i y2j
 1  2  43  51  55  53
 2  1  51  43  53  55
[...]

So each row would container a source and destination (i and j) and then values for y1 at each source and destination.

I'm slowly learning R data manipulation, but this one is stumping me. Kudos for the one line does-it-all answer, as well as a more readable didactic answer.

like image 797
mindless.panda Avatar asked Dec 22 '22 12:12

mindless.panda


2 Answers

This works (apart perhaps from order)

firstdf  <- data.frame(x  = c( 1, 2, 4, 5), 
                       y1 = c(43,51,57,49), y2 = c(55,53,47,44)) 
co       <- combn(firstdf$x,2)
seconddf <- data.frame(xi = c(co[1,], co[2,]), xj = c(co[2,], co[1,]))
thirddf  <- merge(merge(seconddf, firstdf, by.x = "xj", by.y = "x" ),
                  firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

to produce

> thirddf
   xi xj y1j y2j y1i y2i
1   1  2  51  53  43  55
2   1  5  49  44  43  55
3   1  4  57  47  43  55
4   2  4  57  47  51  53
5   2  1  43  55  51  53
6   2  5  49  44  51  53
7   4  5  49  44  57  47
8   4  1  43  55  57  47
9   4  2  51  53  57  47
10  5  1  43  55  49  44
11  5  2  51  53  49  44
12  5  4  57  47  49  44 

where the first and fifth rows match your example.

If you take firstdf as given and insist on one line then you can turn this into

merge(merge(data.frame(xi = c(combn(firstdf$x,2)[1,], combn(firstdf$x,2)[2,]), xj = c(combn(firstdf$x,2)[2,], combn(firstdf$x,2)[1,])), firstdf, by.x = "xj", by.y = "x" ), firstdf, by.x = "xi", by.y = "x", suffixes = c("j", "i") )

but I don't really see the point

like image 101
Henry Avatar answered Dec 24 '22 01:12

Henry


Two lines is the best I can do and still keep it sensible: (Edit: see bottom of answer for one-liner.)

Create some data:

n <- 4
a <- cbind(x=LETTERS[1:n], y=letters[1:n])
a

     x   y  
[1,] "A" "a"
[2,] "B" "b"
[3,] "C" "c"
[4,] "D" "d"

The code:

f <- function(x, i){cbind(i, x[i[,1],], x[i[,2],])}
f(a, t(combn(seq_len(nrow(a)), 2)))

The results:

             x   y   x   y  
[1,] "1" "2" "A" "a" "B" "b"
[2,] "1" "3" "A" "a" "C" "c"
[3,] "1" "4" "A" "a" "D" "d"
[4,] "2" "3" "B" "b" "C" "c"
[5,] "2" "4" "B" "b" "D" "d"
[6,] "3" "4" "C" "c" "D" "d"

EDIT

This can be turned into a one-liner by making use of anonymous functions:

(function(x, i=t(combn(seq_len(nrow(a)), 2))){cbind(i, x[i[,1],], x[i[,2],])})(a)

             x   y   x   y  
[1,] "1" "2" "A" "a" "B" "b"
[2,] "1" "3" "A" "a" "C" "c"
[3,] "1" "4" "A" "a" "D" "d"
[4,] "2" "3" "B" "b" "C" "c"
[5,] "2" "4" "B" "b" "D" "d"
[6,] "3" "4" "C" "c" "D" "d"
like image 42
Andrie Avatar answered Dec 24 '22 00:12

Andrie