I got 2 files which I'd like to combine using R. <pre class="prettyprint"><code>head(bed) chr8 41513235 41513282 ANK1.Exon1 chr8 41518973 41519092 ANK1.Exon2 </code></pre> The first one is giving intervals and their names. (Chromosome, from, to, name) <pre class="prettyprint"><code>head(coverage) chr1 41513235 20 chr1 41513236 19 chr1 41513237 19 </code></pre> The second one is giving coverages for single Bases. (Chromosome, position, coverage) I now want to get the name of each Exon written next to each Position. This will result in some positions with no "Exon" which I want to delete afterwards. I figured out a ways how to do what I want. However it needs 3 for loops and about 15 hours computing time. Since for loops are not best practice in R I'd like to know if anyone knows a better way than: <pre class="prettyprint"><code>coverage <- cbind(coverage, "Exon") coverage[,4] <- NA for(i in 1:nrow(bed)){ for(n in bed[i,2]:bed[i,3]{ for(m in 1:nrow(coverage)){ if(coverage[m,2]==n){ file[m,4] <- bed[i,4] } } } } na.omit(coverage) </code></pre> Since all of the three positions lie in the intervall "ANK1.Exon1", the output should look like this: <pre class="prettyprint"><code>head(coverage) chr1 41513235 20 ANK1.Exon1 chr1 41513236 19 ANK1.Exon1 chr1 41513237 19 ANK1.Exon1 </code></pre>

The fastest way to perform what I was looking for was: <pre class="prettyprint"><code>library("sqldf") res <- sqldf("select * from coverage f1 inner join bed f2 on(f1.position >=f2.'from' and f1.position <=f2.'to')") </code></pre> The calculation time went down to seconds. To get the exact result as indicated above the dataframe was further reduced. <pre class="prettyprint"><code>res <- cbind(res[1:4],res[8]) </code></pre> Thank you all for your help. Edit: For large datasets were the same positions may appear in more than one Chromosome it is helpfull to further add: <pre class="prettyprint"><code>res <- sqldf("select * from coverage f1 inner join bed f2 on(f1.position >=f2.'from' and f1.position <=f2.'to' and f1.Chromosome = f2.Chromosome)") </code></pre>

Alternatives for for loops in R?

Tags:

r

code-coverage

bioinformatics

I got 2 files which I'd like to combine using R.

head(bed)
chr8 41513235 41513282 ANK1.Exon1
chr8 41518973 41519092 ANK1.Exon2

The first one is giving intervals and their names. (Chromosome, from, to, name)

head(coverage)
chr1 41513235 20
chr1 41513236 19
chr1 41513237 19

The second one is giving coverages for single Bases. (Chromosome, position, coverage)

I now want to get the name of each Exon written next to each Position. This will result in some positions with no "Exon" which I want to delete afterwards.

I figured out a ways how to do what I want. However it needs 3 for loops and about 15 hours computing time. Since for loops are not best practice in R I'd like to know if anyone knows a better way than:

coverage <- cbind(coverage, "Exon")
coverage[,4] <- NA

for(i in 1:nrow(bed)){
 for(n in bed[i,2]:bed[i,3]{
  for(m in 1:nrow(coverage)){
   if(coverage[m,2]==n){
    file[m,4] <- bed[i,4]
   }
  }
 }
}

na.omit(coverage)

Since all of the three positions lie in the intervall "ANK1.Exon1", the output should look like this:

head(coverage) 
chr1 41513235 20 ANK1.Exon1 
chr1 41513236 19 ANK1.Exon1 
chr1 41513237 19 ANK1.Exon1

549

asked May 19 '15 08:05

Stern

1 Answers

The fastest way to perform what I was looking for was:

library("sqldf")
res <- sqldf("select * from coverage f1 inner join bed f2
on(f1.position >=f2.'from' and f1.position <=f2.'to')")

The calculation time went down to seconds. To get the exact result as indicated above the dataframe was further reduced.

res <- cbind(res[1:4],res[8])

Thank you all for your help.

Edit: For large datasets were the same positions may appear in more than one Chromosome it is helpfull to further add:

res <- sqldf("select * from coverage f1 inner join bed f2
on(f1.position >=f2.'from' and f1.position <=f2.'to' and f1.Chromosome = f2.Chromosome)")

answered Sep 23 '22 20:09

Stern

Related questions
                            
                                Colors lost in legend when using scale_shape_manual
                            
                                Is there a way in data.table to assign ID's by group based upon an identifier? [duplicate]
                            
                                convert a csv to excel without using xlsx package
                            
                                Merging contents of columns using apply or other vectorized approach
                            
                                Remove the last part of a string after the last "." in R
                            
                                File compression options with ggplot2
                            
                                Calculate quantiles in R without interpolation - round up or down to actual value
                            
                                How to change line properties in ggplot2 halfway in a time series?
                            
                                R: Count objects in column-list
                            
                                Multiplying column value by another value depending on value in certain column R
                            
                                Group categories in R according to first letters of a string?
                            
                                How do I fit distributions to sample data in R?
                            
                                Apply parentheses around elements of R dataframe
                            
                                Plot colour coded world map using ggplot2
                            
                                How to replace a value in a data frame in R?
                            
                                Data frame no longer a data frame once element is removed [duplicate]
                            
                                Specific spaces between bars in a barplot - ggplot2 - R
                            
                                Access name of .rmd file and use in R
                            
                                Make return from S3 indexing function "[" invisible
                            
                                Why does my ESS R session fall back to C locale?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Alternatives for for loops in R?

Tags:

r

code-coverage

bioinformatics

Stern

People also ask

1 Answers

Stern

Recent Activity

Donate For Us