Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the longest common subsequence in ERLANG

I'm new to this ERLANG, I know the basics. It's like scheme but broader. I know how to create a function but I'm having problems with creating a function that gets the Longest Common Subsequence.

lcs(str1,str2) is a function that will accept two string and output an integer:

lcs(algorithm,logarithm) will output 7 because the longest common subsequence that can be made is lgrithm which is 7 in size.

Any answer is greatly appreciated.

like image 987
ThisGuy Avatar asked Nov 26 '25 06:11

ThisGuy


1 Answers

You have a pretty good implementation of an LCS algorithm available on Rosettacode, which is:

-module(lcs).
-compile(export_all).

lcs_length(S,T) ->
    {L,_C} = lcs_length(S,T,dict:new()),
    L.

lcs_length([]=S,T,Cache) ->
    {0,dict:store({S,T},0,Cache)};
lcs_length(S,[]=T,Cache) ->
    {0,dict:store({S,T},0,Cache)};
lcs_length([H|ST]=S,[H|TT]=T,Cache) ->
    {L,C} = lcs_length(ST,TT,Cache),
    {L+1,dict:store({S,T},L+1,C)};
lcs_length([_SH|ST]=S,[_TH|TT]=T,Cache) ->
    case dict:is_key({S,T},Cache) of
        true -> {dict:fetch({S,T},Cache),Cache};
        false ->
            {L1,C1} = lcs_length(S,TT,Cache),
            {L2,C2} = lcs_length(ST,T,C1),
            L = lists:max([L1,L2]),
            {L,dict:store({S,T},L,C2)}
    end.

lcs(S,T) ->
    {_,C} = lcs_length(S,T,dict:new()),
    lcs(S,T,C,[]).

lcs([],_,_,Acc) ->
    lists:reverse(Acc);
lcs(_,[],_,Acc) ->
    lists:reverse(Acc);
lcs([H|ST],[H|TT],Cache,Acc) ->
    lcs(ST,TT,Cache,[H|Acc]);
lcs([_SH|ST]=S,[_TH|TT]=T,Cache,Acc) ->
    case dict:fetch({S,TT},Cache) > dict:fetch({ST,T},Cache) of
        true ->
            lcs(S,TT,Cache, Acc);
        false ->
            lcs(ST,T,Cache,Acc)
    end.

Used as:

raringbeast:Code pierre $ erl
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe]
[kernel-poll:false]

Eshell V5.9.1  (abort with ^G)
1> c(lcs).
{ok,lcs}
2> lcs:lcs("logarithm", "algorithm").
"lgrithm"
3> lcs:lcs_length("logarithm", "algorithm").
7

--

Edit: Made the algorithm a little easier to understand

The cache here is just an interesting way to improve algorithm performance in some cases, but this isn't required here. We can just write, removing the cache:

lcs_length([],_T) ->
    0;
lcs_length(_S,[]) ->
    0;
lcs_length([_H|ST],[_H|TT]) ->
    1 + lcs_length(ST,TT);
lcs_length([_SH|ST]=S,[_TH|TT]=T) ->
    L1 = lcs_length(S,TT),
    L2 = lcs_length(ST,T),
    lists:max([L1,L2]). 

To sum up:

  1. The LCS of "" and anything is 0.
  2. Same for the LCS of anything and "".
  3. The LCS of two words starting by the same letter is the LCS of the two words minus the first letter plus 1.
  4. The LCS of two words starting by different letters is the max of (a) the LCS of the first word minus the first letter and the second, and (b) the LCS of the first word and the second minus the first letter.
like image 136
Pierre Avatar answered Nov 28 '25 21:11

Pierre



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!