I have a large data frame that looks like this:
> my_table
track_fid start_gid end_gid
1 1 100 82
2 2 82 100
3 3 100 82
4 4 100 32
5 5 82 100
6 6 82 100
7 7 82 100
8 8 100 82
9 9 34 100
10 10 31 100
My aim is to add column to_from
at the end and populate it with characters y
or n
.
Let's take the first row as an example - the value in start_gid
= 100 and the value in end_gid
= 82. If another other row in exists anywhere in the table where the values are the inverse, i.e., where end_gid
= 100 and the value in start_gid
= 82, I'd like to fill the column to_from
of both rows with y
. If the inverse does not exist, the first row should be filled with with n. The key here is to loop over every row and search for it's inverse in the table according to the order of track_fid
. If an inverse is found where track_fid
is greater, a y should be inserted. Once an inverse receives a value the y
, it cannot be used again.
For example, this would be a sample output:
> output
track_fid start_gid end_gid to_from
1 1 100 82 y
2 2 82 100 y
3 3 100 82 y
4 4 100 32 n
5 5 82 100 y
6 6 82 100 y
7 7 82 100 n
8 8 100 82 y
9 9 34 100 n
10 10 31 100 n
Is there a way to create such an output in R ?
Something along the lines of:
for(i in 2:nrow(my_table)) {
if(my_table[i-1,"start_gid"]= my_table[i,"end_gid"]) {
my_table$to_from = "y" } else { my_table$to_from = "n"}
> str(output)
'data.frame': 10 obs. of 4 variables:
$ track_fid: int 1 2 3 4 5 6 7 8 9 10
$ start_gid: int 100 82 100 100 82 82 82 100 34 31
$ end_gid : int 82 100 82 32 100 100 100 82 100 100
$ to_from : Factor w/ 2 levels "n","y": 2 2 2 1 2 2 1 2 1 1
I don't see a way to do this without a loop in R. You can do this with for
loops and next
and break
statements. But in such a case I turn to Rcpp if the problem size is large.
library(Rcpp)
sourceCpp(code = "
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::LogicalVector myfun(const Rcpp::IntegerVector x, const Rcpp::IntegerVector y) {
Rcpp::LogicalVector res(x.length());
for (int i=0; i<(x.length()-1); i++) {
if(res(i)) continue;
for (int j=i+1; j<x.length(); j++) {
if (res(j)) continue;
if (x(i) == y(j) && x(j) == y(i)) {
res(i) = true;
res(j) = true;
break;
}
}
}
return res;
}
")
DF$from_to <- myfun(DF$start_gid, DF$end_gid)
# track_fid start_gid end_gid from_to
#1 1 100 82 TRUE
#2 2 82 100 TRUE
#3 3 100 82 TRUE
#4 4 100 32 FALSE
#5 5 82 100 TRUE
#6 6 82 100 TRUE
#7 7 82 100 FALSE
#8 8 100 82 TRUE
#9 9 34 100 FALSE
#10 10 31 100 FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With