Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function between rows, grouped by a variable, computing all possible combinations between variable in other column

Tags:

dataframe

r

I have a big data.frame in R, whose oversimplified version looks like this (the real data.frame has 20 colors in the "Color" column and 10 different numbers in "Number" column:

Color   Number  Y
blue    1       5
blue    2       3
blue    3       2
red     1       5
red     2       8
red     3       2
green   1       2
green   2       9
green   3       3

For every color in "Color", I would like to apply a function between all combinations of numbers in the "Numbers" column, by comparing the corresponding values of the "Y" column. Lets take a simple function as example:

if x >= y, print 1, else print 0 # where x and y represent the first and second values to be compared, respectively 

I would obtain this as an output data.frame:

Color   Comparison  Y
blue    1_vs_2      1
blue    1_vs_3      1
blue    2_vs_1      0
blue    2_vs_3      1
blue    3_vs_1      0
blue    3_vs_2      0
red     1_vs_2      0
red     1_vs_3      1
red     2_vs_1      1
red     2_vs_3      1
red     3_vs_1      0
red     3_vs_2      0
green   1_vs_2      0
green   1_vs_3      0
green   2_vs_1      1
green   2_vs_3      1
green   3_vs_1      1
green   3_vs_2      0
like image 463
Lucas Avatar asked Dec 19 '22 09:12

Lucas


1 Answers

Have you considered SQL? You could merge the data back onto itself. If you restrict the Color to be the same and the Number to be different you should get every pairwise comparison you want. This is the same concept as @Psidom's answer - he just does it in a data.table join.

library(sqldf)
res <- sqldf("SELECT     l.Color, l.Number as l_number, r.Number as r_number,
                          case when l.Y >= r.Y then 1 else 0 end as Y
              FROM       df as l
              INNER JOIN df as r
              ON         l.Color = r.Color AND
                         l.Number != r.Number
             ")

res$comparison <- paste0(res$l_number,"_vs_",res$r_number)

res

   Color l_number r_number Y comparison
1   blue        1        2 1     1_vs_2
2   blue        1        3 1     1_vs_3
3   blue        2        1 0     2_vs_1
4   blue        2        3 1     2_vs_3
5   blue        3        1 0     3_vs_1
6   blue        3        2 0     3_vs_2
7    red        1        2 0     1_vs_2
8    red        1        3 1     1_vs_3
9    red        2        1 1     2_vs_1
10   red        2        3 1     2_vs_3
11   red        3        1 0     3_vs_1
12   red        3        2 0     3_vs_2
13 green        1        2 0     1_vs_2
14 green        1        3 0     1_vs_3
15 green        2        1 1     2_vs_1
16 green        2        3 1     2_vs_3
17 green        3        1 1     3_vs_1
18 green        3        2 0     3_vs_2
like image 85
Mike H. Avatar answered Mar 09 '23 01:03

Mike H.