I would like to iterate through every possible k-length string (called k-mer) given a list of symbols. For example, if k = 3
and symbols = {A, C, G, T}
, then:
AAA
AAC
AAG
...
TTG
TTT
Here's my code to generate the string:
local k = 3
local bases = {'A', 'C', 'T', 'G'}
-- Generate the string (AAA...AAA)
local kmer_gen = {}
for i = 1,k do kmer_gen[i] = "A" end
local kmer = table.concat(kmer_gen)
It works but it surely does not look good. Can this be achieved more elegantly?
Now, I'm not sure how to iterate through the possible k-mers. One solution would be keeping substituting each character, but this doesn't seen efficient. Another way would be decoding from binary (each 2 bits represent a base), but the implementation is confusing and requires bitwise operations. Any other thoughts?
Here is a solution using an iterator. It's a nice example of coroutines, a technique well worth knowing in Lua. See also http://www.lua.org/pil/9.3.html.
local bases = {'A', 'C', 'T', 'G'}
local function allstrings(n,t,k,s)
k=k or 1
s=s or {}
if k>n then
coroutine.yield(table.concat(s))
else
for i=1,#t do
s[k]=t[i]
allstrings(n,t,k+1,s)
end
end
end
local function kmer(n,t)
return coroutine.wrap(allstrings),n,t
end
for w in kmer(3,bases) do
print(w)
end
Here is a relatively simple tail-recursive solution I would probably use:
local bases = {'A', 'C', 'T', 'G'}
local function kmers(n, prev)
prev = prev or {''}
if n <= 0 then return prev end
local k,r = 1,{}
for i=1,#prev do
for j=1,#bases do
r[k] = prev[i] .. bases[j]
k = k+1
end
end
return kmers(n-1, r)
end
_3mers = kmers(3) -- usage example
Note: You could write r[#r+1]
instead of managing k
manually but doing so is not so complicated and significantly faster in cases like this (the #
operator is O(log n)
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With