I am getting stuck with the Wikipedia description of the predecessor function in lambda calculus.
What Wikipedia says is the following:
PRED := λn.λf.λx. n (λg.λh. h (g f)) (λu.x) (λu.u)
Can someone explain reduction processes step-by-step?
Thanks.
Semantics of Lambda Expressions Evaluating a lambda expression is called reduction . The basic reduction rule involves substituting expressions for free variables in a manner similar to the way that the parameters in a function definition are passed as arguments in a function call.
The lambda calculus does not specify an evaluation order.
y 1 Page 2 A reducible expression, or redex, is any expression to which the above (beta-reduction) rule can be immediately applied. For example, λx. (λy. y) z is not a redex, but its nested expression (λy.
In lambda calculus, there are only lambdas, and all you can do with them is substitution. Lambdas are like a function or a method - if you are familiar with programming, they are functions that take a function as input, and return a new function as output.
The second simplification is that the lambda calculus only uses functions of a single input. An ordinary function that requires two inputs, for instance the function, can be reworked into an equivalent function that accepts a single input, and as output returns another function, that in turn accepts a single input. For example,
Recursion is the definition of a function using the function itself. Lambda calculus cannot express this as directly as some other notations: all functions are anonymous in lambda calculus, so we can't refer to a value which is yet to be defined, inside the lambda term defining that same value.
is a constant function. The lambda calculus may be seen as an idealized version of a functional programming language, like Haskell or Standard ML . Under this view, β-reduction corresponds to a computational step. This step can be repeated by additional β-reductions until there are no more applications left to reduce.
Ok, so the idea of Church numerals is to encode "data" using functions, right? The way that works is by representing a value by some generic operation you'd perform with it. We can therefore go in the other direction as well, which can sometimes make things clearer.
Church numerals are a unary representation of the natural numbers. So, let's use Z
to mean zero and Sn
to represent the successor of n
. Now we can count like this: Z
, SZ
, SSZ
, SSSZ
... The equivalent Church numeral takes two arguments--the first corresponding to S
, and second to Z
--then uses them to construct the above pattern. So given arguments f
and x
, we can count like this: x
, f x
, f (f x)
, f (f (f x))
...
Let's look at what PRED does.
First, it creates a lambda taking three arguments--n
is the Church numeral whose predecessor we want, of course, which means that f
and x
are the arguments to the resulting numeral, which thus means that the body of that lambda will be f
applied to x
one time fewer than n
would.
Next, it applies n
to three arguments. This is the tricky part.
The second argument, that corresponds to Z
from earlier, is λu.x
--a constant function that ignores one argument and returns x
.
The first argument, that corresponds to S
from earlier, is λgh.h (g f)
. We can rewrite this as λg. (λh.h (g f))
to reflect the fact that only the outermost lambda is being applied n
times. What this function does is take the accumulated result so far as g
and return a new function taking one argument, which applies that argument to g
applied to f
. Which is absolutely baffling, of course.
So... what's going on here? Consider the direct substitution with S
and Z
. In a non-zero number Sn
, the n
corresponds to the argument bound to g
. So, remembering that f
and x
are bound in an outside scope, we can count like this: λu.x
, λh. h ((λu.x) f)
, λh'. h' ((λh. h ((λu.x) f)) f)
... Performing the obvious reductions, we get this: λu.x
, λh. h x
, λh'. h' (f x)
... The pattern here is that a function is being passed "inward" one layer, at which point an S
will apply it, while a Z
will ignore it. So we get one application of f
for each S
except the outermost.
The third argument is simply the identity function, which is dutifully applied by the outermost S
, returning the final result--f
applied one fewer times than the number of S
layers n
corresponds to.
McCann's answer explains it pretty well. Let's take a concrete example for Pred 3 = 2:
Consider expression: n (λgh.h (g f)) (λu.x). Let K = (λgh.h (g f))
For n = 0, we encode 0 = λfx.x
, so when we apply the beta reduction for (λfx.x)(λgh.h(gf))
means (λgh.h(gf))
is replaced 0 times. After further beta-reduction we get:
λfx.(λu.x)(λu.u)
reduces to
λfx.x
where λfx.x = 0
, as expected.
For n = 1, we apply K for 1 times:
(λgh.h (g f)) (λu.x)
=> λh. h((λu.x) f)
=> λh. h x
For n = 2, we apply K for 2 times:
(λgh.h (g f)) (λh. h x)
=> λh. h ((λh. h x) f)
=> λh. h (f x)
For n = 3, we apply K for 3 times:
(λgh.h (g f)) (λh. h (f x))
=> λh.h ((λh. h (f x)) f)
=> λh.h (f (f x))
Finally, we take this result and apply an id function to it, we got
λh.h (f (f x)) (λu.u)
=> (λu.u)(f (f x))
=> f (f x)
This is the definition of number 2.
The list based implementation might be easier to understand, but it takes many intermediate steps. So it is not as nice as the Church's original implementation IMO.
After Reading the previous answers (good ones), I’d like to give my own vision of the matter in hope it helps someone (corrections are welcomed). I’ll use an example.
First off, I’d like to add some parenthesis to the definition that made everything clearer to me. Let’s redifine the given formula to:
PRED := λn λf λx.(n (λgλh.h (g f)) (λu.x)) (λu.u)
Let’s also define three Church numerals that will help with the example:
Zero := λfλx.x
One := λfλx. f (Zero f x)
Two := λfλx. f (One f x)
Three := λfλx. f (Two f x)
In order to understand how this works, let's focus first on this part of the formula:
n (λgλh.h (g f)) (λu.x)
From here, we can extract this conclusions:
n
is a Church numeral, the function to be applied is λgλh.h (g f)
and the starting data is λu.x
With this in mind, let's try an example:
PRED Three := λf λx.(Three (λgλh.h (g f)) (λu.x)) (λu.u)
Let's focus first on the reduction of the numeral (the part we explained before):
Three (λgλh.h (g f)) (λu.x)
Which reduces to:
(λgλh.h (g f)) (Two (λgλh.h (g f)) (λu.x))
(λgλh.h (g f)) ((λgλh.h (g f)) (One (λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (Zero (λgλh.h (g f)) (λu.x))))
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) ((λfλx.x) (λgλh.h (g f)) (λu.x)))) -- Here we lose one application of f
(λgλh.h (g f)) ((λgλh.h (g f)) ((λgλh.h (g f)) (λu.x)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h ((λu.x) f)))
(λgλh.h (g f)) ((λgλh.h (g f)) (λh.h x))
(λgλh.h (g f)) (λh.h ((λh.h x) f))
(λgλh.h (g f)) (λh.h (f x))
(λh.h ((λh.h (f x) f)))
Ending up with:
λh.h f (f x)
So, we have:
PRED Three := λf λx.(λh.h (f (f x))) (λu.u)
Reducing again:
PRED Three := λf λx.((λu.u) (f (f x)))
PRED Three := λf λx.f (f x)
As you can see in the reductions, we end up applying the function one time less thanks to a clever way of using functions.
Using add1 as f
and 0 as x
, we get:
PRED Three add1 0 := add1 (add1 0) = 2
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With