Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternatives for for loops in R?

I got 2 files which I'd like to combine using R.

head(bed)
chr8 41513235 41513282 ANK1.Exon1
chr8 41518973 41519092 ANK1.Exon2

The first one is giving intervals and their names. (Chromosome, from, to, name)

head(coverage)
chr1 41513235 20
chr1 41513236 19
chr1 41513237 19

The second one is giving coverages for single Bases. (Chromosome, position, coverage)

I now want to get the name of each Exon written next to each Position. This will result in some positions with no "Exon" which I want to delete afterwards.

I figured out a ways how to do what I want. However it needs 3 for loops and about 15 hours computing time. Since for loops are not best practice in R I'd like to know if anyone knows a better way than:

coverage <- cbind(coverage, "Exon")
coverage[,4] <- NA

for(i in 1:nrow(bed)){
 for(n in bed[i,2]:bed[i,3]{
  for(m in 1:nrow(coverage)){
   if(coverage[m,2]==n){
    file[m,4] <- bed[i,4]
   }
  }
 }
}

na.omit(coverage)

Since all of the three positions lie in the intervall "ANK1.Exon1", the output should look like this:

head(coverage) 
chr1 41513235 20 ANK1.Exon1 
chr1 41513236 19 ANK1.Exon1 
chr1 41513237 19 ANK1.Exon1 
like image 549
Stern Avatar asked May 19 '15 08:05

Stern


People also ask

What can be used instead of for loop?

Array. filter, map, some have the same performance as forEach. These are all marginally slower than for/while loop. Unless you are working on performance-critical functionalities, it should be fine using the above methods.

Should for loops be avoided in R?

A FOR loop is the most intuitive way to apply an operation to a series by looping through each item one by one, which makes perfect sense logically but should be avoided by useRs given the low efficiency.

Is apply faster than for loop R?

The apply functions do run a for loop in the background. However they often do it in the C programming language (which is used to build R). This does make the apply functions a few milliseconds faster than regular for loops.

Are for loops good in R?

For loop in R Programming Language is useful to iterate over the elements of a list, dataframe, vector, matrix, or any other object. It means, the for loop can be used to execute a group of statements repeatedly depending upon the number of elements in the object.


1 Answers

The fastest way to perform what I was looking for was:

library("sqldf")
res <- sqldf("select * from coverage f1 inner join bed f2
on(f1.position >=f2.'from' and f1.position <=f2.'to')")

The calculation time went down to seconds. To get the exact result as indicated above the dataframe was further reduced.

res <- cbind(res[1:4],res[8])

Thank you all for your help.

Edit: For large datasets were the same positions may appear in more than one Chromosome it is helpfull to further add:

res <- sqldf("select * from coverage f1 inner join bed f2
on(f1.position >=f2.'from' and f1.position <=f2.'to' and f1.Chromosome = f2.Chromosome)")
like image 77
Stern Avatar answered Sep 23 '22 20:09

Stern