Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prolog performance and recursion type

I was playing with permutation in a couple of programs and stumbled upon this little experiment:

Permutation method 1:

permute([], []).
permute([X|Rest], L) :-
    permute(Rest, L1),
    select(X, L, L1).

Permutation method 2:

permute([], []).
permute(L, [P | P1]) :-
    select(P, L, L1),
    permute(L1, P1).

Permutation method 3 (use the built-in):

permute(L, P) :- permutation(L, P).

I understand that it's good practice to use tail recursion, and generally using built-ins is supposed to be efficient. But when I run the following:

time(findall(P, permute([1,2,3,4,5,6,7,8,9], P), L)).

I got the following results, which are relatively consistent across several runs:

Method 1:

% 772,064 inferences, 1.112 CPU in 2.378 seconds (47% CPU, 694451 Lips)

Method 2:

% 3,322,118 inferences, 2.126 CPU in 4.660 seconds (46% CPU, 1562923 Lips)

Method 3:

% 2,959,245 inferences, 1.967 CPU in 4.217 seconds (47% CPU, 1504539 Lips)

So the non-tail recursive method is quite significantly more real-time efficient.

Is a particular recursion type generally more real time efficient, all other things being equal (I know that's not always a simple premise)? What this experiment tells me is that I may not want to be always striving for tail recursion, but I may need to do a performance analysis first, then weigh performance benefit against other benefits that tail recursion does have.

like image 376
lurker Avatar asked Jun 10 '13 02:06

lurker


3 Answers

Really nice question, +1!

Tail call (and, as a special case, tail recursion) optimization only applies if the predicate is deterministic! This is not the case here, so your predicate will always require the local stack space, no matter in which order you place the goals. The non-tail recursive version is more (time-)efficient here when generating all solutions because it needs to do fewer unifications on backtracking.

EDIT: I am expanding on this point since it is well worth studying the performance difference in more detail.

First, for clarity, I rename the two different versions to make clear which version I am talking about:

Variant 1: Non-tail recursive:

permute1([], []).
permute1([X|Rest], L) :-
    permute1(Rest, L1),
    select(X, L, L1).

Variant 2: Tail-recursive:

permute2([], []).
permute2(L, [P|P1]) :-
    select(P, L, L1),
    permute2(L1, P1).

Note again that, although the second version is clearly tail recursive, tail call (and hence also tail recursion) optimisation only helps if the predicate is deterministic, and hence cannot help when we generate all permutations, because choice points are still left in that case.

Note also that I am deliberately retaining the original variable naming and main predicate name to avoid introducing more variants. Personally, I prefer a naming convention that makes clear which variables denote lists by appending an s to their names, in analogy to regular English plural. Also, I prefer predicate names that more clearly exhibit the (at least intended and desirable) declarative, relational nature of the code, and recommend to avoid imperative names for this reason.


Consider now unfolding the first variant and partially evaluating it for a list of 3 elements. We start with a simple goal:

?- Xs = [A,B,C], permute1(Xs, L).

and then gradually unfold it by plugging in the definition of permute1/2, while making all head unifications explicit. In the first iteration, we obtain:

?- Xs = [A,B,C], Xs1 = [B,C], permute1(Xs1, L1), select(A, L, L1).

I am marking the head unifications in bold.

Now, still one goal of permute1/2 is left. So we repeat the process, again plugging in the predicate's only applicable rule body in place of its head:

?- Xs = [A,B,C], Xs1 = [B,C], Xs2 = [C], permute1(Xs2, L2), select(B, L1, L2), select(A, L, L1).

One more pass of this, and we obtain:

?- Xs = [A,B,C], Xs1 = [B,C], Xs2 = [C], select(C, L2, []), select(B, L1, L2), select(A, L, L1).

This is what the original goal looks like if we just unfold the definition of permute1/2 repeatedly.


Now, what about the second variant? Again, we start with a simple goal:

?- Xs = [A,B,C], permute2(Xs, Ys).

One iteration of unfolding permute2/2 yields the equivalent version:

?- Xs = [A,B,C], Ys = [P|P1], select(P, Xs, L1), permute2(L1, P1).

and a second iteration yields:

?- Xs = [A,B,C], Ys = [P|P1], select(P, Xs, L1),  Ys1 = [P1|P2], select(P1, L1, L2), permute2(L2, P2).

I leave the third and last iteration as a simple exercise that I strongly recommend you do.


And from this it is clear what we initially probably hadn't expected: A big difference lies in the head unifications, which the first version performs deterministically right at the start, and the second version performs over and over on backtracking.

This famous example nicely shows that, somewhat contrary to common expectation, tail recursion can be quite slow if the code is not deterministic.

like image 105
mat Avatar answered Nov 17 '22 06:11

mat


I suspect what triggered this investigation was the discussion about tail-recursive sum/2 using an accumulator versus not. The sum/2 example is very cut-and-dry; one version is doing the arithmetic on the stack, the other is using an accumulator. However, like most things in the real world, the general truth is "it depends." For instance, compare the efficiency of methods 1 and 2 using full instantiation:

?- time(permute([1,2,3,4,5,6,7,8,9], [1,2,3,4,5,6,7,8,9])).
% 18 inferences, 0.000 CPU in 0.000 seconds (66% CPU, 857143 Lips)
true ;
% 86,546 inferences, 0.022 CPU in 0.022 seconds (100% CPU, 3974193 Lips)
false.

?- time(permute([1,2,3,4,5,6,7,8,9], [1,2,3,4,5,6,7,8,9])).
% 18 inferences, 0.000 CPU in 0.000 seconds (62% CPU, 857143 Lips)
true ;
% 47 inferences, 0.000 CPU in 0.000 seconds (79% CPU, 940000 Lips)
false.

Method 1 beats method 2 when you're generating solutions (as in your tests), but method 2 beats method 1 when you're simply checking. Looking at the code it's easy to see why: the first one has to re-permute the whole tail of the list, while the second one just has to try selecting out one item. In this case it may be easy to point to the generating case and say it's more desired. That determination is simply one of the tradeoffs one must keep track of when dealing with Prolog. It's very difficult to make predicates that are all things to all people and always perform great; you must decide which are the "privileged paths" and which are not.

I do vaguely recall someone recently showed an example of appending lists "during the return" and how you could take something that isn't or shouldn't be tail recursive and make it work thanks to unification, but I don't have the link handy. Hopefully whoever brought it up last time (Will?) will show up and share it.

Great question, by the way. Your investigation method is valid, you'll just need to take into account other instantiation patterns as well. Speaking personally, I usually try to worry harder about correctness and generality than performance up-front. If I see immediately how to use an accumulator instead I will, but otherwise I won't do it that way until I run into an actual need for better performance. Tail recursion is just one method for improving performance; frequently there are other things that need to be addressed as badly or worse.

like image 4
Daniel Lyons Avatar answered Nov 17 '22 05:11

Daniel Lyons


Really nice question.

Waiting for someone to post a time/space analysis, the only caveat I can offer is that method 1 & 2 don't terminate when first argument is free, while method 3 does.

Anyway, method 1 seems really much more efficient than the builtin. Good to know.

edit: and given that the library implementation merely adjust instantiation of arguments and calls method 1, I'm going to discuss on SWI-Prolog mailing list your method 2 as alternative (or, you prefer to do it yourself, let me know).

more edit: I forgot previously to point out that permutation/3 (let's say, method 2), gives lexicographically ordered solutions, while method 1 doesn't. I think that could be a strong preferential requirement, but should be expressed as an option, given the performance gain that method 1 allows.

?- time(call_nth(permute1([0,1,2,3,4,5,6,7,8,9],P),1000000)).
% 3,112,758 inferences, 3,160 CPU in 3,162 seconds (100% CPU, 984974 Lips)
P = [1, 4, 8, 3, 7, 6, 5, 9, 2|...] .

?- time(call_nth(permute2([0,1,2,3,4,5,6,7,8,9],P),1000000)).
% 10,154,843 inferences, 9,779 CPU in 9,806 seconds (100% CPU, 1038398 Lips)
P = [2, 7, 8, 3, 9, 1, 5, 4, 6|...] .

YAP gives still more gain!

?- time(call_nth(permute1([0,1,2,3,4,5,6,7,8,9],P),1000000)).
% 0.716 CPU in 0.719 seconds ( 99% CPU)
P = [1,4,8,3,7,6,5,9,2,0]

?- time(call_nth(permute2([0,1,2,3,4,5,6,7,8,9],P),1000000)).
% 8.357 CPU in 8.368 seconds ( 99% CPU)
P = [2,7,8,3,9,1,5,4,6,0]

edit: I've posted a comment on SWI-Prolog doc page about this theme.

like image 4
CapelliC Avatar answered Nov 17 '22 04:11

CapelliC