How to bypass a nested for loop?

Question

So the situation is this: I basically have one data frame where it contains about 100,000 rows of data. I am interested in a particular column of data, POS, and I wanted to check if the value of POS is between two values of another data frame, Start and End, and keep track of how many instances of those are there.

E.g., in my first data frame, I have something like

ID POS  
A   20  
B   533  
C   600

And in my other data frame, I have stuff like

START      END  
123        150  
489        552  
590        600

I want to know how many items in POS are in any of the START-END ranges. So in this case, there's be 2 items. Also, if possible, can I get the IDs of the ones with POS between Start and End, too?

How can I go about doing that without having to use a nested for loop?

Tim Biegeleisen · Accepted Answer

This is a fairly common problem which might happen in the context of a database. Here is a solution using sqldf:

library(sqldf)

query <- "SELECT POS, ID FROM df1 INNER JOIN df2 "
query <- paste0(query, "ON df1.POS BETWEEN df2.START AND df2.END")
sqldf(query)

If the ranges in your second data frame might overlap, then the above query could return more than one result for a given POS value. In this case, replace SELECT POS with SELECT DISTINCT POS.

akrun · Answer

We can use a non-equi join with data.table

library(data.table)
setDT(df1)[df2, on = .(POS > START, POS <= END)][, sum(!is.na(ID))]
#[1] 2

How to bypass a nested for loop?

Tags:

loops

r

Alex Johanssen

2 Answers

Tim Biegeleisen

akrun

Recent Activity

Donate For Us

How to bypass a nested for loop?

Tags:

loops

r

Alex Johanssen

2 Answers

Tim Biegeleisen

akrun

Related questions

Recent Activity

Donate For Us