Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lua: iterate through every possible k-length string given a list of symbols

I would like to iterate through every possible k-length string (called k-mer) given a list of symbols. For example, if k = 3 and symbols = {A, C, G, T}, then:

AAA
AAC
AAG
...
TTG
TTT

Here's my code to generate the string:

local k = 3
local bases = {'A', 'C', 'T', 'G'}

-- Generate the string (AAA...AAA)
local kmer_gen = {}
for i = 1,k do kmer_gen[i] = "A" end
local kmer = table.concat(kmer_gen)

It works but it surely does not look good. Can this be achieved more elegantly?

Now, I'm not sure how to iterate through the possible k-mers. One solution would be keeping substituting each character, but this doesn't seen efficient. Another way would be decoding from binary (each 2 bits represent a base), but the implementation is confusing and requires bitwise operations. Any other thoughts?

like image 523
Fábio Perez Avatar asked Nov 06 '13 08:11

Fábio Perez


2 Answers

Here is a solution using an iterator. It's a nice example of coroutines, a technique well worth knowing in Lua. See also http://www.lua.org/pil/9.3.html.

local bases = {'A', 'C', 'T', 'G'}

local function allstrings(n,t,k,s)
    k=k or 1
    s=s or {}
    if k>n then
        coroutine.yield(table.concat(s))
    else
        for i=1,#t do
            s[k]=t[i]
            allstrings(n,t,k+1,s)
        end
    end
end

local function kmer(n,t)
    return coroutine.wrap(allstrings),n,t
end

for w in kmer(3,bases) do
    print(w)
end
like image 71
lhf Avatar answered Nov 19 '22 08:11

lhf


Here is a relatively simple tail-recursive solution I would probably use:

local bases = {'A', 'C', 'T', 'G'}

local function kmers(n, prev)
  prev = prev or {''}
  if n <= 0 then return prev end
  local k,r = 1,{}
  for i=1,#prev do
    for j=1,#bases do
      r[k] = prev[i] .. bases[j]
      k = k+1
    end
  end
  return kmers(n-1, r)
end

_3mers = kmers(3) -- usage example

Note: You could write r[#r+1] instead of managing k manually but doing so is not so complicated and significantly faster in cases like this (the # operator is O(log n)).

like image 4
catwell Avatar answered Nov 19 '22 08:11

catwell