Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to join an enumerated char list in Elixir

Tags:

elixir

I'm writing a function that converts RNA to DNA. The input comes in as a char list: 'ATCG' so I loop through each letter and use a map to get the converted letter. Simple.

The problem is after enumerating the character list I can't merge it back to a char list. Enum.join returns a string, to_charlist just returns a list of the character points, and I don't see any other functions that could help.

Here is my code:

def to_rna(dna) do
  dna_to_rna = %{
   'G' => 'C',
   'C' => 'G',
   'T' => 'A',
   'A' => 'U'
  }
  Enum.map(dna, fn(letter) ->
    Map.get(dna_to_rna, [letter])
  end)    
end

This outputs a list of characters: ['U', 'G', 'C']. How do I convert this list into a charlist: 'UGC'?

like image 900
Chip Dean Avatar asked Feb 27 '17 18:02

Chip Dean


2 Answers

Concatenating the Lists

You can use Enum.concat/1 at the end to join the lists:

Enum.concat(['U', 'A', 'G', 'C'])      # =>   'UAGC'

Using a Reducer

You can also use reduce/3 instead of map/2 and do it one step:

def to_rna(dna) do
  mapping = %{
   'G' => 'C',
   'C' => 'G',
   'T' => 'A',
   'A' => 'U'
  }

  Enum.reduce(dna, [], fn(letter, acc) ->
    acc ++ Map.get(mapping, [letter])
  end)  
end

This will give 'UAGC' for the input value 'ATCG'.

like image 179
Sheharyar Avatar answered Nov 05 '22 21:11

Sheharyar


Not to necropost, but I recently started learning Elixir and have been doing some Exercism problems as well. This answer might be somewhat off-topic, since I'm proposing a similar solution without the use of a Map.

The Exercism problem's (rna-transcription, no link) function input, as described above, is a charlist and expects a chartist as output. From the charlist documentation:

A charlist is a list of integers where all the integers are valid code points.

We can also look at Exercism's provided function spec, @spec to_rna([char]) :: [char], and see a charlist is both the expected input and output. The built-in types documentation shows us charlist() is the same as [char()].

From the question's description, we see there's a 1:1 mapping of code point to code point. A map is a totally valid way to represent that, but I'd posit we can solve the problem just using iteration and pattern matching.

I feel a salient point was left out of some of the other answers: why are the Enum elements wrapped with [], e.g. Enum.map(dna, fn x -> Map.get(%{...}, [x])? It's because each iterated element is a single character, i.e. an integer/code point. By wrapping the value in [], that code point is turned into a chartist with one character, e.g. 'C'. The keys/values of the example Map are all single-element charlists. This successfully maps nucleotide-to-nucleotide, but it also changes the element's type from integer to charlist. This results in a list of charlists, not a single/flat charlist, e.g. 'GCTA' turns into ['C', 'G', 'A', 'U'] (a list of charlists), which is not the same as 'CGAU'. As pointed out in other solutions, Enum.concat/1 and Enum.flat_map/2 are both ways to turn the list of lists into a flat chartist. From The flat_map docs:

... conceptually, [flat_map/2] is similar to a combination of map/2 and concat/1.

So, can we perform the mapping without the intermediate character lists? I believe the answer is "yes." Instead of converting each code point into a charlist, why not just convert the code point to the expected code point directly?

From the doc links above, we know code points are just integers. To demonstrate:

iex[1]> Enum.each('CGAU', &IO.inspect/1)
67
71
65
85
:ok

This means we really just need to map the integer/code point values to other code point values. However, I feel mapping integer values a bit unnatural when we're trying to map character literals. Fortunately, Elixir gives us the ? symbol, which represents the integer value of a character literal (more here), e.g.

iex> ?G
71
iex[1]> 'ATCG' == [?A, ?T, ?C, ?G]
true
iex[2]> [?A, ?T, ?C, ?G] == [65, 84, 67, 71]
true

Using this, we can directly convert individual code points to their respective code point mapping while still using the character literal, e.g. ?G -> ?C. You can still make a Map of these, e.g. %{?G => ?C, ...}; however, I feel this is a good scenario for pattern matching. We can use Enum.map with a function with multiple clauses. With this approach, the individual code points are mapped 1:1, resulting in a new flat, charlist -- allowing us to forgo the need of concat or flat_map (and save a bit of typing).

@spec to_rna([char]) :: [char]
def to_rna(dna) do
    Enum.map(dna, fn
        ?G -> ?C
        ?C -> ?G
        ?T -> ?A
        ?A -> ?U
    end)
end
like image 42
erratum Avatar answered Nov 05 '22 21:11

erratum