step_n(0, I, I).
step_n(N, In, Out) :-
N > 0, plus(N1, 1, N), phase_step(In, T),
step_n(N1, T, Out).
phase_step
is a function that transforms data.
Will this step_n
run in almost the same memory as phase_step
? If not, how should I rewrite it to do so? Will this depend on phase_step having a single solution?
EDIT: After some debugging using prolog_current_frame
, I found out that if phase_step
is a simple function like Out is In + 1
, then optimization happens but not in my use case.
Why is TCO dependent on phase_step
predicate?
In many cases, tail recursion is good for performance: When the predicate is deterministic, a tail call means that the Prolog system can automatically reuse the allocated space of the environment on the local stack.
Tail recursion is defined as a recursive function in which the recursive call is the last statement that is executed by the function. So basically nothing is left to execute after the recursion call.
Will this depend on phase_step having a single solution?
Kind of, but a bit stronger still: It depends on phase_step
being deterministic, which means, not leaving any "choice points". A choice point is a future path to be explored; not necessarily one that will produce a further solution, but still something Prolog needs to check.
For example, this is deterministic:
phase_step_det(X, X).
It has a single solution, and Prolog does not prompt us for more:
?- phase_step_det(42, Out).
Out = 42.
The following has a single solution, but it is not deterministic:
phase_step_extrafailure(X, X).
phase_step_extrafailure(_X, _Y) :-
false.
After seeing the solution, there is still something Prolog needs to check. Even if we can tell by looking at the code that that something (the second clause) will fail:
?- phase_step_extrafailure(42, Out).
Out = 42 ;
false.
The following has more than one solution, so it is not deterministic:
phase_step_twosolutions(X, X).
phase_step_twosolutions(X, Y) :-
plus(X, 1, Y).
?- phase_step_twosolutions(42, Out).
Out = 42 ;
Out = 43.
Why is TCO dependent on phase_step predicate?
If there are further paths to be explored, then there must be data about those paths stored somewhere. That "somewhere" is some sort of stack data structure, and for every future path there must be a frame on the stack. This is why your memory usage grows. And with it, the computation time (the following uses copies of your step_n
with my corresponding phase_step
variants from above):
?- time(step_n_det(100_000, 42, Out)).
% 400,002 inferences, 0.017 CPU in 0.017 seconds (100% CPU, 24008702 Lips)
Out = 42 ;
% 7 inferences, 0.000 CPU in 0.000 seconds (87% CPU, 260059 Lips)
false.
?- time(step_n_extrafailure(100_000, 42, Out)).
% 400,000 inferences, 4.288 CPU in 4.288 seconds (100% CPU, 93282 Lips)
Out = 42 ;
% 100,005 inferences, 0.007 CPU in 0.007 seconds (100% CPU, 13932371 Lips)
false.
?- time(step_n_twosolutions(100_000, 42, Out)).
% 400,000 inferences, 4.231 CPU in 4.231 seconds (100% CPU, 94546 Lips)
Out = 42 ;
% 4 inferences, 0.007 CPU in 0.007 seconds (100% CPU, 548 Lips)
Out = 43 ;
% 8 inferences, 0.005 CPU in 0.005 seconds (100% CPU, 1612 Lips)
Out = 43 ;
% 4 inferences, 0.008 CPU in 0.008 seconds (100% CPU, 489 Lips)
Out = 44 ;
% 12 inferences, 0.003 CPU in 0.003 seconds (100% CPU, 4396 Lips)
Out = 43 ;
% 4 inferences, 0.009 CPU in 0.009 seconds (100% CPU, 451 Lips)
Out = 44 . % many further solutions
One way to explore this is using the SWI-Prolog debugger, which has a way of showing you alternatives (= choice points = future paths to be explored):
?- trace, step_n_det(5, 42, Out).
Call: (9) step_n_det(5, 42, _1496) ? skip % I typed 's' here.
Exit: (9) step_n_det(5, 42, 42) ? alternatives % I typed 'A' here.
[14] step_n_det(0, 42, 42)
Exit: (9) step_n_det(5, 42, 42) ? no debug % I typed 'n' here.
Out = 42 ;
false.
?- trace, step_n_extrafailure(5, 42, Out).
Call: (9) step_n_extrafailure(5, 42, _1500) ? skip
Exit: (9) step_n_extrafailure(5, 42, 42) ? alternatives
[14] step_n_extrafailure(0, 42, 42)
[14] phase_step_extrafailure(42, 42)
[13] phase_step_extrafailure(42, 42)
[12] phase_step_extrafailure(42, 42)
[11] phase_step_extrafailure(42, 42)
[10] phase_step_extrafailure(42, 42)
Exit: (9) step_n_extrafailure(5, 42, 42) ? no debug
Out = 42 ;
false.
All of those alternatives correspond to extra interpreter frames. If you use SWI-Prolog's visual debugger, it will also show you a graph representation of your stack, including all open choice points (though I've always found that hard to make sense of).
So if you want TCO and not grow the stack, you need your phase step to execute deterministically. You can do that by making the phase_step
predicate itself deterministic. You can also put a cut after the phase_step
call inside step_n
.
Here are the calls from above with a cut after each phase_step
:
?- time(step_n_det(100_000, 42, Out)).
% 400,001 inferences, 0.017 CPU in 0.017 seconds (100% CPU, 24204529 Lips)
Out = 42 ;
% 7 inferences, 0.000 CPU in 0.000 seconds (83% CPU, 737075 Lips)
false.
?- time(step_n_extrafailure(100_000, 42, Out)).
% 400,000 inferences, 0.023 CPU in 0.023 seconds (100% CPU, 17573422 Lips)
Out = 42 ;
% 5 inferences, 0.000 CPU in 0.000 seconds (93% CPU, 220760 Lips)
false.
?- time(step_n_twosolutions(100_000, 42, Out)).
% 400,000 inferences, 0.023 CPU in 0.023 seconds (100% CPU, 17732727 Lips)
Out = 42 ;
% 5 inferences, 0.000 CPU in 0.000 seconds (94% CPU, 219742 Lips)
false.
Do not place cuts blindly, only once you understand where and why you really need them. Note how in the extrafailure
case the cut only removes failures, but in the twosolutions
case it removes actual solutions.
One helpful tool to understood code performance issues, notably unwanted non-determinism, is a ports profiler tool as found on e.g. ECLiPSe and Logtalk. The Logtalk ports_profiler
tool is portable so we can use it here. We start by wrapping your code (from your gist link):
:- use_module(library(lists), []).
:- object(step).
:- public(step_n/3).
:- use_module(lists, [reverse/2]).
% pattern for the nth digit mth-coeffcient
digit_m(N, M, D) :-
divmod(M, N, Q, _), divmod(Q, 4, _, C),
(C = 0, D = 0; C = 1, D = 1; C = 2, D = 0; C = 3, D = -1).
calculate_digit_n(N, In, D) :-
calculate_digit_n_(N, In, D, 1, 0).
calculate_digit_n_(_, [], D, _, Acc) :- D1 is abs(Acc), divmod(D1, 10, _, D).
calculate_digit_n_(N, [I | Is], D, M, Acc) :-
digit_m(N, M, C), P is C*I, M1 is M+1, Acc1 is Acc+P,
calculate_digit_n_(N, Is, D, M1, Acc1).
phase_step(In, Out) :-
length(In, L), L1 is L + 1, phase_step_(In, Out, L1, 1, []).
phase_step_(_, Out, L, L, Acc) :- reverse(Out, Acc).
phase_step_(In, Out, L, N, Acc) :-
N < L, calculate_digit_n(N, In, D), N1 is N + 1,
phase_step_(In, Out, L, N1, [D | Acc]).
step_n(0, I, I).
step_n(N, In, Out) :-
prolog_current_frame(Fr), format('~w ', Fr),
N > 0, N1 is N - 1, phase_step(In, T),
step_n(N1, T, Out).
:- end_object.
%:- step_n(10, [1, 2, 3, 4, 5, 6, 7, 8], X).
And then (using SWI-Prolog as the backend as that is the Prolog system you told us you're using):
$ swilgt
...
?- {ports_profiler(loader)}.
% [ /Users/pmoura/logtalk/tools/ports_profiler/ports_profiler.lgt loaded ]
% [ /Users/pmoura/logtalk/tools/ports_profiler/loader.lgt loaded ]
% (0 warnings)
true.
?- logtalk_load(step, [debug(on), source_data(on)]).
% [ /Users/pmoura/step.pl loaded ]
% (0 warnings)
true.
?- step::step_n(10, [1, 2, 3, 4, 5, 6, 7, 8], X).
340 15578 30816 46054 61292 76530 91768 107006 122244 137482
X = [3, 6, 4, 4, 0, 6, 7, 8] .
?- ports_profiler::data.
------------------------------------------------------------------------------
Entity Predicate Fact Rule Call Exit *Exit Fail Redo Error
------------------------------------------------------------------------------
step calculate_digit_n/3 0 80 80 0 80 0 0 0
step calculate_digit_n_/5 0 720 720 0 720 0 0 0
step digit_m/3 0 640 640 40 600 0 0 0
step phase_step/2 0 10 10 0 10 0 0 0
step phase_step_/5 0 90 90 0 90 0 0 0
step step_n/3 1 10 11 0 11 0 0 0
------------------------------------------------------------------------------
true.
The *Exit
column is for non-deterministic exists from the procedure box. For help with the tool and with interpreting the table results, see https://logtalk.org/manuals/devtools/ports_profiler.html But is clear by just a glance to the table that both phase_step/2
and step_n/3
are non-deterministic.
Update
Note that tail call optimization (TCO) doesn't mean or require the predicate to be deterministic. In your case, TCO can be applied by a Prolog compiler as the last call in the rule for the step_n/3
predicate is call to itself. That means that a stack frame can be saved on that specific recursive call. It doesn't mean that there are no choice-points being created by what precedes the recursive call. Using once/1
(as you mention on the comments) simply discards the choice-point created when phase_step/2
is called as that predicate itself is non-deterministic. That's what the table shows. The step_n/3
predicate is also non-deterministic and thus calling it creates a choice-point when the first argument is 0
, which happens when you call the predicate with a zero on the first argument or when the proof for the query reaches the base case on this recursive definition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With