Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Symmetric pairs in R

Tags:

loops

r

I have a large data frame that looks like this:

> my_table
   track_fid start_gid end_gid
1          1       100      82
2          2        82     100
3          3       100      82
4          4       100      32
5          5        82     100
6          6        82     100
7          7        82     100
8          8       100      82
9          9        34     100
10        10        31     100

My aim is to add column to_from at the end and populate it with characters y or n.

Let's take the first row as an example - the value in start_gid = 100 and the value in end_gid = 82. If another other row in exists anywhere in the table where the values are the inverse, i.e., where end_gid = 100 and the value in start_gid = 82, I'd like to fill the column to_from of both rows with y. If the inverse does not exist, the first row should be filled with with n. The key here is to loop over every row and search for it's inverse in the table according to the order of track_fid. If an inverse is found where track_fid is greater, a y should be inserted. Once an inverse receives a value the y, it cannot be used again.

For example, this would be a sample output:

> output
   track_fid start_gid end_gid to_from
1          1       100      82       y
2          2        82     100       y
3          3       100      82       y
4          4       100      32       n
5          5        82     100       y
6          6        82     100       y
7          7        82     100       n
8          8       100      82       y
9          9        34     100       n
10        10        31     100       n

Is there a way to create such an output in R ?

Something along the lines of:

for(i in 2:nrow(my_table)) {
if(my_table[i-1,"start_gid"]= my_table[i,"end_gid"]) {
my_table$to_from = "y" } else { my_table$to_from = "n"}


> str(output)
'data.frame':   10 obs. of  4 variables:
 $ track_fid: int  1 2 3 4 5 6 7 8 9 10
 $ start_gid: int  100 82 100 100 82 82 82 100 34 31
 $ end_gid  : int  82 100 82 32 100 100 100 82 100 100
 $ to_from  : Factor w/ 2 levels "n","y": 2 2 2 1 2 2 1 2 1 1
like image 641
the_darkside Avatar asked Feb 06 '23 10:02

the_darkside


1 Answers

I don't see a way to do this without a loop in R. You can do this with for loops and next and break statements. But in such a case I turn to Rcpp if the problem size is large.

library(Rcpp)
sourceCpp(code = "
          #include <Rcpp.h>
          // [[Rcpp::export]]
          Rcpp::LogicalVector myfun(const Rcpp::IntegerVector x, const Rcpp::IntegerVector y) {
            Rcpp::LogicalVector res(x.length());
            for (int i=0; i<(x.length()-1); i++) {
              if(res(i)) continue;
              for (int j=i+1; j<x.length(); j++) {
                if (res(j)) continue;
                if (x(i) == y(j) && x(j) == y(i)) {
                   res(i) = true;
                   res(j) = true;
                   break;
                }
              }
            }
            return res;
          }
          ")

DF$from_to <- myfun(DF$start_gid, DF$end_gid)
#   track_fid start_gid end_gid from_to
#1          1       100      82    TRUE
#2          2        82     100    TRUE
#3          3       100      82    TRUE
#4          4       100      32   FALSE
#5          5        82     100    TRUE
#6          6        82     100    TRUE
#7          7        82     100   FALSE
#8          8       100      82    TRUE
#9          9        34     100   FALSE
#10        10        31     100   FALSE
like image 75
Roland Avatar answered Feb 16 '23 20:02

Roland