How to count coincidences on each character of two large strings without triggering the Out of Local Stack exception?

Question

I need a clause that counts char coincidences between two large strings but omitting '_' coincidences. I have this code:

fit(GEN1, GEN2, N, N) :-
   length(GEN1, L1),
   length(GEN2, L2),
   0 is L1*L2.

fit([P1|R1], [P2|R2], N, TOTAL) :-
   member(P1, ['_',a,c,t,g]),
   member(P2, ['_',a,c,t,g]),
   append([P1],[P2],T),
   (  member(T,[[a,a],[c,c],[t,t],[g,g]])
   -> X is N+1
   ;  X is N
   ),
   fit(R1,R2,X,TOTAL).

Where GEN1 and GEN2 are lists containing all characters large strings.

I've tried increasing the stack limit to avoid Out of Local Stack exception with little success.

The issue is that, is called often and in deep recursive clauses. Is there any better way to do this?

EDIT

The clause needs to stop when one or both lists are empty.

EDIT 2

Is worth saying that testings on all answers below were done using 64bit prolog, with the --stack-limit=32g option as my code isn't well optimized and the fit clause is a small part of a larger process, but was the main problem with my code.

EDIT 3

CapelliC code worked using the less resources.

false code using the library(reif) v2 worked the faster.

See Complexity of counting matching elements in two sequences using library(aggregate) for more proposed solutions.

false · Accepted Answer

It seems that there is no point to insist that you have letters out of "_actg" all the time. A generalized definition seems to be sufficient. Using library(reif):

fit([], _, N,N).
fit([_|_], [], N,N).
fit([P1|R1], [P2|R2], N,TOTAL) :-
   if_( ( P1 = P2, dif(P1, '_') ), X is N+1, X = N ),
   fit(R1, R2, X,TOTAL).

Update: please make sure to use v2 of library(reif). The original version did not compile dif/3.

And here a version for systems that can only index on one argument simultaneously:

fit([], _, N,N).
fit([P1|R1], L2, N,TOTAL) :-
    ifit(L2, [P1|R1], N,TOTAL).

ifit([], _, N,N).
ifit([P2|R2], [P1|R1], N,TOTAL) :-
   if_( ( P1 = P2, dif(P1, '_') ), X is N+1, X = N ),
   fit(R1, R2, X,TOTAL).

CapelliC · Answer

if your Prolog has library(aggregate) you can do

fit(GEN1, GEN2, N) :-
  aggregate_all(count, (nth1(P,GEN1,S),nth1(P,GEN2,S),memberchk(S,[a,c,g,t])), N).

edit

Depending on the statistic of data, a noticeable improvement can be obtained just swapping the last two calls, i.e. ...(nth1(P,GEN1,S),memberchk(S,[a,c,g,t]),nth1(P,GEN2,S))...

edit

Of course a tight loop it's better that a double indexed scan. For performance, I would write it like

fit_cc(GEN1, GEN2, N) :-
  fit_cc(GEN1, GEN2, 0, N).

fit_cc([X|GEN1], [Y|GEN2], C, N) :-
  (  X\='_' /*memberchk(X, [a,c,g,t])*/, X=Y
  -> D is C+1 ; D=C
  ),
  fit_cc(GEN1, GEN2, D, N).
fit_cc(_, _, N, N).

but the generality and correctness allowed by library(reif) v2, as seen in @false' answer and comments, seems to be well worth the (pretty small) overhead.

How to count coincidences on each character of two large strings without triggering the Out of Local Stack exception?

Tags:

performance

optimization

stack-overflow

prolog

RezzDeWitt

2 Answers

false

CapelliC

Recent Activity

Donate For Us

How to count coincidences on each character of two large strings without triggering the Out of Local Stack exception?

Tags:

performance

optimization

stack-overflow

prolog

RezzDeWitt

2 Answers

false

CapelliC

Related questions

Recent Activity

Donate For Us