I am trying to get a FASTA file form NCBI website, I use the following function
getncbiseq <- function(accession){
dbs <- c()
for (i in 1:numdbs){
db <- dbs[i]
choosebank(db)
resquery <- try(query(".tmpquery", paste("AC=", accession)),silent = TRUE)
if (!(inherits(resquery, "try-error"))){
queryname <- "query2"
thequery <- paste("AC=",accession,sep="")
query(`queryname`,`thequery`)
# see if a sequence was retrieved:
seq <- getSequence(query2$req[[1]])
closebank()
return(seq)
}
closebank()
}
print(paste("ERROR: accession",accession,"was not found"))
}
When I try to retrieve the sequence
mydata <- getncbiseq("NC_001477")
Error in getSequence(query2$req[[1]]) : object 'query2' not found
Is there a better way to shorten these loop function also ?
if I use
query('queryname','the query')
#or
query("queryname","thequery")
I get another error
Error in query("queryname", "thequery") : invalid request:"unknown list at (^): \"(^)thequery\""
Thanks for great help. I was stuck at this point for a full day. I finally got the following code worked under windows 10 with R3.4.0(32bits):-
getncbiseq <- function(accession)
{
require("seqinr") # this function requires the SeqinR R package
# first find which ACNUC database the accession is stored in:
dbs <- c("genbank","refseq","refseqViruses","bacterial")
numdbs <- length(dbs)
for (i in 1:numdbs)
{
db <- dbs[i]
choosebank(db)
# check if the sequence is in ACNUC database 'db':
resquery <- try(query(".tmpquery", paste("AC=", accession)), silent = TRUE)
if (!(inherits(resquery, "try-error"))) {
queryname <- "query2"
thequery <- paste("AC=", accession, sep="")
query2 <- query(queryname, thequery)
# see if a sequence was retrieved:
seq <- getSequence(query2$req[[1]])
closebank()
return(seq)
}
closebank()
}
print(paste("ERROR: accession",accession,"was not found"))
}
I think you intended to assign your call to query()
to a variable called query2
, but you forgot to do it. Try this:
if (!(inherits(resquery, "try-error"))) {
queryname <- "query2"
thequery <- paste("AC=", accession, sep="")
query2 <- query(queryname, thequery)
# see if a sequence was retrieved:
seq <- getSequence(query2$req[[1]])
closebank()
return(seq)
}
As you mentioned, the rest of your code also has some quirks and kinks which could probably be improved upon.
Update:
Here is a refactor of your code using sapply
on the dbs
vector instead of an explicit for loop (the latter which is usually frowned upon by R people):
processdbs <- function(x, y) {
choosebank(x)
resquery <- try(query(".tmpquery", paste("AC=", y)), silent = TRUE)
if (!(inherits(resquery, "try-error"))) {
queryname <- "query2"
thequery <- paste("AC=", y, sep="")
query2 <- query(queryname, thequery)
# see if a sequence was retrieved:
seq <- getSequence(query2$req[[1]])
closebank()
return(seq)
}
closebank()
}
getncbiseq <- function(accession) {
dbs <- c("genbank","refseq","refseqViruses","bacterial")
result <- sapply(dbs, processdbs, y=accession)
closebank()
print(paste("ERROR: accession",accession,"was not found"))
}
You may have to do a slight amount of additional work to inspect the result
vector and determine whether a sequence was retrieved anywhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With