Here is my code
slidingwindowplotATGC = function(windowsize, inputseq)
{
starts = seq(1, length(inputseq)-windowsize, by = windowsize)
n = length(starts)
chunkGs = numeric(n)
chunkAs = numeric(n)
chunkTs = numeric(n)
chunkCs = numeric(n)
for (i in 1:n) {
chunk = windowsize[starts[i]:(starts[i]+9999)]
chunkG = sum("g" == chunk)/length(chunk)
chunkA = sum("a" == chunk)/length(chunk)
chunkT = sum("t" == chunk)/length(chunk)
chunkC = sum("c" == chunk)/length(chunk)
chunkGs[i] = chunkG
chunkAs[i] = chunkA
chunkTs[i] = chunkT
chunkCs[i] = chunkC
}
plot(starts,chunkGs,type="b",ylim=c(min(min(chunkAs),min(chunkTs),min(chunkCs),min(chunkGs)),max(max(chunkAs),max(chunkTs),max(chunkCs),max(chunkGs))),col = "red")
points(starts,chunkTs,col = "blue")
points(starts,chunkAs,col = "green")
points(starts,chunkCs)
}
Im getting the following error message,
Error in seq.default(1, length(inputseq) - windowsize, by = windowsize) :
wrong sign in 'by' argument
which I never got before when running codes of this sort, infact I re ran old code that worked perfectly before, except this time Im getting this error message which doesn't seem to make any sense at all! I need help with this before I go completely insane... Maybe Im just bad at this program, but it seems to me that it has a mind of its own... I was also getting an error message before regarding the ylim function, stating that it needed to be a finite value, which is what I was giving it? HELP!!!
Change
starts = seq(1, length(inputseq)-windowsize, by = windowsize)
to
starts = seq(1, nchar(inputseq)-windowsize, by = windowsize)
assuming you're using a character vector as inputseq, such as
slidingwindowplotATGC(3, "ATAGACGATACGATACCCCGAGGGTAGGTA")
ETA: Aside from that difference, there are some very serious issues with how you are using character vectors. For example:
windowsize[starts[i]:(starts[i]+9999)]
Why does it look like you are selecting from windowsize, which is just an integer of your window size? Were you trying to select from inputseq?
Even if you were selecting from inputseq, the way to do that is substr(inputseq, start, stop)
Where does the starts[i]+9999 come from? Do you mean starts[i]+windowsize?
You should start over and carefully consider what you are trying to do, and learn the right tools to do it within R.
ETA: Here is a proposed rewrite of what you're trying to do (you'll need to install the zoo package first):
library(zoo)
slidingwindowplotATGC = function(windowsize, inputseq)
{
print(nchar(inputseq)-windowsize)
s = strsplit(inputseq, "")[[1]]
starts = seq(1, nchar(inputseq)-windowsize, by = windowsize)
n = length(starts)
letters = c("a", "c", "g", "t")
colors = c("green", "black", "red", "blue")
counts = t(sapply(letters, function(l) rollapply(s, windowsize, function(x) mean(x == l))))
plot(counts[1, ], type="l", col=colors[1])
for (i in 2:4) {
points(counts[i, ], type="l", col=colors[i])
}
print(counts)
}
slidingwindowplotATGC(10, "aagaaaagatcaaagaccagccgccccaccccccagagccccccc")
This should get you most of the way there. After that, you're on your own ;-)
A further condensation. You need to specify windowsize (width of the window) and by (periodicity of sampling) separately, although I think you wanted them the same (i.e. chop the sequence into exclusive chunks) -- if you want a sliding window you could use by=1.
The error you're seeing above is most likely occurring because for some reason windowsize is greater than nchar(inputseq).
slidingwindowplotATGC = function(windowsize, by, inputseq) {
s = strsplit(inputseq, "")[[1]]
colors = c("green", "black", "red", "blue")
counts = rollapply(factor(s), width=windowsize, by=by,table)
matplot(counts,type="l", lty=1,col=colors)
counts
}
itest <- "aagaaaagatcaaagaccagccgccccaccccccagagccccccc"
slidingwindowplotATGC(10, itest)
You should also check Bioconductor -- it's extremely likely that there's efficient code in there somewhere for doing this sort of summary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With