The implementation of Integer>>#factorial
in Pharo is:
factorial
"Answer the factorial of the receiver."
self = 0 ifTrue: [^ 1].
self > 0 ifTrue: [^ self * (self - 1) factorial].
self error: 'Not valid for negative integers'
This a tail-recursive definition. However, I can evaluate 10000 factorial
without error in the workspace.
Does Pharo perform tail-call optimisation in any circumstances, is it doing some other optimisation, or is it just using a really deep stack?
Because F# is a language that heavily uses recursion, its compiler employs a trick that is called "tail call optimization". With this trick, the last function a function calls (even when it is not herself) does not burden the stack. This allows tail recursion to be essentially a loop, in terms of stack pressure.
Regarding functions call optimization, gcc can do tail-call elimination to save the cost of allocating a new stack frame, and tail recursion elimination to turn a recursive function to non-recursive iterative one.
Tail-call optimization is not supported in Python, but we can apply it by changing the code inside the function, however, we prefer to do it automatically using a decorator and without changing the function's code.
There is no mystery in the execution model of Pharo. The recursive fragment
^ self * (self - 1) factorial
that happens inside the second ifTrue:
compiles to the following sequence of bytecodes:
39 <70> self ; receiver of outer message *
40 <70> self ; receiver of inner message -
41 <76> pushConstant: 1 ; argument of self - 1
42 <B1> send: - ; subtract
43 <D0> send: factorial ; send factorial (nothing special here!)
44 <B8> send: * ; multiply
45 <7C> returnTop ; return
Note that in line 43 nothing special happens. The code just sends factorial
in the same way it would, had the selector been any other. In particular we can see that there is no special manipulation of the stack here.
This doesn't mean that there cannot be optimizations in the underlying native code. But that is a different discussion. It is the execution model the one that matters to the programmer because any optimization underneath bytecodes is meant to support this model at the conceptual level.
UPDATE
Interestingly, the non-recursive version
factorial2
| f |
f := 1.
2 to: self do: [:i | f := f * i].
^f
is a little bit slower that the recursive one (Pharo). The reason must be that the overhead associated to increasing i
is a little bit greater than the recursive send mechanism.
Here are the expressions I tried:
[25000 factorial] timeToRun
[25000 factorial2] timeToRun
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With