Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using pattern matching and recursion in Elixir to split the list

I am new in Elixir and new in programming, especially functional programming (less than 1 year experience in Ruby and RoR). For the moment I am reading "Programming Elixir" by Dave Thomas. And I am completely stuck with one problem from the Lists and Recursion theme.

Dave asking to "implement the following Enum functions using no library functions or list comprehensions: ...split ..."

The original function is here.

I solve the problem with rather long, probably not too optimal (and seems to me partially disobeying Dave's restrictions) way:

def split(list, count) do
  if count < 0, do: count = len(list) + count
  list1 = filter1(list, count)
  list2 = list -- list1
  # list2 = filter2(list, list1)
  { list1, list2 }
end

def len([]), do: 0
def len([ _head | tail ]), do: 1 + len(tail)

defp filter1([], _count), do: []
defp filter1([ head | tail], count) do
  if count > 0 do
    [ head | filter1(tail, count - 1) ]
  else
    filter1(tail, count - 1)
  end
end

Browsing through the page with Dave's and other readers solutions I find out pattern which was used by 2 or 3 readers:

def split([head | tail], count) when count > 0 do
  {left, right} = split(tail, count-1)
  {[head | left], right}
end
def split(list, _count), do: {[], list}

This code seems to me rather elegant, but I can not understand how it works. I mean I've tried to comprehend what happening step by step and I failed.

I can imagine what happening in my filter1 recursive function. List is forming like this: [ head_1 | ... head_n | filter1(tail_n, count - n) ]

But I can't understand why { left, right } tuple is matching the recursive call for the function. What should match to the left and what to the right? How this recursion works?...

(The meaning of the second line (of the function) is also not clear for me but I think this is strictly connected with the first question.)

UPD:

Thanks to @Josh Petitt, @tkowal and @CodyPoll I think I moved forward in my comprehension of the case.

Now I am thinking about the recursion-matching pattern discussed in this "pyramidal way":

1  split([1, 2, 3], 2)
2    {left, right} = split([2, 3], 1)
3    {[1 | left], right}
4      {left, right} = split([3], 0)
5      {[1 | [2 | left]], right}
6    {[1 | [2 | []]], [3]}
7  {[1 ,2], [3]}
  • First step (line 1): call the function.
  • Second step (lines 2, 3): match {left, right} tuple to the recursive function call and return {[1 | left], right} tuple
  • Third step (lines 4, 5): match {left, right} tuple to the next recursive call and return {[1 | [2 | left]], right} tuple
  • Fourth step (line 6): since split([3], 0) matching the second clause we get {left, right} = {[], [3]} at this point and we can no replace left and right variables in the line 5 with [] and [3] accordingly
  • Fifth step (line 7): "pipes" do their job and return the list to finally match the left variable

What I still don't understand is how folks come to this type of solution? (Probably experience with both pattern matching and recursion.)

And another thing bothers me. If we take line 3 for example, it is a "return" which contains two variables. But no values was actually matched to this variables. According to my scheme this variables only match their values in line 7.

How Elixir deal with this? Is it some implicit nil matching? Or I am taking the process wrong and there is no actual return until the final step?

like image 436
max.underthesun Avatar asked Dec 25 '22 07:12

max.underthesun


1 Answers

Recursion is sometimes very difficult to understand just looking at the code. Mentally tracking what is put on the stack and what and when it is retrieved can exhaust our working memory very quickly. It can be useful to draw the path of every passage in the hierarchy of the recursion tree, and this is what I've done to try to answer to your question.

To understand how things work in this example, first of all we have to recognize the existence of two distinct stages in the Clause 1, the first stage is the code executed before the recursion, the second stage is the code that will be executed after it.

(to better explain the flow, I've added some variables to the original code)

# Clause 1
def split(in_list, count) when count > 0 do
  # FIRST STAGE
  [head | tail] = in_list

  # RECURSION
  result = split(tail, count - 1)

  # SECOND STAGE
  {left, right} = result 
  return = {[head | left], right}
end

#Clause 2
def split(list, _count), do: return = {[], list}

Now, before continue to reading, please look at the code and try to answer to these questions:

  • after how many iterations of the first block the result variable will be bound for the first time ?
  • How many times the recursion split(tail, count - 1) will be called inside Clause 1 ?
  • How many times the Clause 2 split(list, _count) will be called?
  • What is the role of the Clause 2 ?

And now compare your answers looking at this schema that show every passage and its hierarchy:

(as an example, we split the list [1, 2, 3, 4, 5] after its third element to obtain the tuple {[1, 2, 3], [4, 5]})

split([1,2,3,4,5], 3)

> FIRST STAGE of CLAUSE 1 / ITERATION 1 called as: split( [1, 2, 3, 4, 5], 3 ):
  Got 'head'=1, 'tail'=[2, 3, 4, 5], 'count'=3
  now I'm going to iterate passing the tail [2, 3, 4, 5],
  Clause 1 will match as the counter is still > 0
     > FIRST STAGE of CLAUSE 1 / ITERATION 2 called as: split( [2, 3, 4, 5], 2 ):
       Got 'head'=2, 'tail'=[3, 4, 5], 'count'=2
       now I'm going to iterate passing the tail [3, 4, 5],
       Clause 1 will match as the counter is still > 0
          > FIRST STAGE of CLAUSE 1 / ITERATION 3 called as: split( [3, 4, 5], 1 ):
            Got 'head'=3, 'tail'=[4, 5], 'count'=1
            Now the counter is 0 so I've reached the split point,
            and the Clause 2 instead of Clause 1 will match at the next iteration

> Greetings from CLAUSE 2 :-), got [4, 5], returning {[], [4, 5]}

          < Im BACK to the SECOND STAGE of ITERATION 3
            got result from CLAUSE 2: {[], [4, 5]}
            {left, right} = {[], [4, 5]}
            Now I'm build the return value as {[head | left], right},
            prepending 'head' (now is 3) to the previous value
            of 'left' (now is []) at each iteration,
            'right' instead is always [4, 5].
            So I'm returning {[3], [4, 5]} to iteration 2
     < Im BACK to the SECOND STAGE of ITERATION 2
       got result from previous Clause 1 / Iteration 3, : {[3], [4, 5]}
       {left, right} = {[3], [4, 5]}
       Now I'm build the return value as {[head | left], right},
       prepending 'head' (now is 2) to the previous value
       of 'left' (now is [3]) at each iteration,
       'right' instead is always [4, 5].
       So I'm returning {[2, 3], [4, 5]} to iteration 1
< Im BACK to the SECOND STAGE of ITERATION 1
  got result from previous Clause 1 / Iteration 2, : {[2, 3], [4, 5]}
  {left, right} = {[2, 3], [4, 5]}
  Now I'm build the return value as {[head | left], right},
  prepending 'head' (now is 1) to the previous value
  of 'left' (now is [2, 3]) at each iteration,
  'right' instead is always [4, 5].
  And my final return is at least: {[1, 2, 3], [4, 5]}
{[1, 2, 3], [4, 5]}

In the schema, the beginning of every iteration is marked with

> FIRST STAGE of CLAUSE 1 / ITERATION n called as: ...

meanwhile the beginning of the continuation of the iteration is marked as

< I'm BACK to the SECOND STAGE of ITERATION n

Now we can clearly see that:

  • the first block is iterated three times;
  • the Clause 2 is called just ONE time;
  • second block is iterated three times, the first time it receive the result from the Clause 2, the remaining times from Clause 1;
  • the result of Clause 2 contains the right portion of the splitted list, computed in third iteration of Clause 1.

So, what is the role for Clause 2? It is a trick, a way to pass back, down to the continuation of the iterations, the otherwise inaccessible value of the right part of the splitted list.

Here it is a step-by-step explanation of the code:

In the first stage the value of the first parameter of the function, the variable I've called in_list, is decomposed in its head and tail components:

# FIRST STAGE
[head | tail] = in_list

then the head is pushed on the stack and the tail and the update counter are passed to the recursion:

result = split(tail, count - 1)

after count iterations, all the left-splitted elements are on the stack, and all the right-splitted elements are packed in the tail. The the Clause 2 is now called.

After the Clause 2 call, the recursion continue with the second stage, where the result variable is bound to the two (partially) splitted list returned by the previous split/2 iteration.

Now, at every iteration, we extract the left and right lists fron the result:

{left, right} = result

and add to the left the head popped from the stack ( that was computed in the first stage), returning the result to the caller:

return = {[head | left], right}

so at every iteration the left part grows 'till the final value.

The first result is returned by the Clause 2, matched when the iterations had reached the split point i.e. when count = 0. (Clause 2 will fire just one time). All the subsequent results will be returned by the folded second stages of the Clause 1 iterations.

This is the code to print the above schema:

def split(in_list, count), do: split(in_list, count, 1)

    # Clause 1
    def split(in_list=[head | tail], count, iteration) when count > 0 do
      offset = String.duplicate " ", 5 * (iteration - 1)
      IO.puts offset <> "> FIRST STAGE of CLAUSE 1 / ITERATION #{inspect iteration} called as: split( #{inspect in_list}, #{inspect(count)} ):"
      IO.puts offset <> "  Got 'head'=#{inspect head}, 'tail'=#{inspect tail}, 'count'=#{inspect count}"
      if (count - 1) > 0 do
        IO.puts offset <> "  now I'm going to iterate passing the tail #{inspect(tail)},"
        IO.puts offset <> "  Clause 1 will match as the counter is still > 0"
      else
        IO.puts offset <> "  Now the counter is 0 so I've reached the split point,"
        IO.puts offset <> "  and the Clause 2 instead of Clause 1 will match at the next iteration"
      end

      result = split(tail, count-1, iteration + 1)

      IO.puts offset <> "< Im BACK to the SECOND STAGE of ITERATION #{inspect(iteration)}"
      if (count - 1) == 0 do
        IO.puts offset <> "  got result from CLAUSE 2: #{inspect result}"
      else
        IO.puts offset <> "  got result from previous Clause 1 / Iteration #{iteration + 1}, : #{inspect result}"
      end
      IO.puts offset <> "  {left, right} = #{inspect result}"
      {left, right} = result
      IO.puts offset <> "  Now I'm build the return value as {[head | left], right},"
      IO.puts offset <> "  prepending 'head' (now is #{inspect head}) to the previous value"
      IO.puts offset <> "  of 'left' (now is #{inspect left}) at each iteration,"
      IO.puts offset <> "  'right' instead is always #{inspect right}."
      return = {[head | left], right}
      if (iteration > 1) do
        IO.puts offset <> "  So I'm returning #{inspect return} to iteration #{inspect(iteration - 1)}"
      else
        IO.puts offset <> "  And my final return is at least: #{inspect return} "
      end
      return
    end

    # Clause 2
    def split(list, _count, _iteration) do
      IO.puts ""
      IO.puts "> Greetings from CLAUSE 2 :-), got #{inspect(list)}, returning #{inspect({[], list})}"
      IO.puts ""
      {[], list}
    end

Hope this can help to clarify a little bit the strategy adopted and the internal recursion mechanism.

(my English is not very good, hope someone can fix this text)

like image 93
Guido Avatar answered Dec 28 '22 15:12

Guido