Cross-posted at https://invenia.github.io/blog/2018/08/20/ycombinator/.
Introduction
This post is a short note on the notorious Y combinator. No, not that company, but the computer sciency objects that looks like this:
Don’t worry if that looks complicated; we’ll get down to some examples and the nitty gritty details in just a second. But first, what even is this Y combinator thing? Simply put, the Y combinator is a higher-order function that can be used to define recursive functions in languages that don’t support recursion. Cool!
For readers unfamiliar with the above notation, the right-hand side of Equation \eqref{eq:Y-combinator} is a lambda term, which is a valid expression in lambda calculus:
- , a variable, is a lambda term;
- if is a lambda term, then the anonymous function is a lambda term;
- if and are lambda terms, then is a lambda term, which should be interpreted as applied with argument ; and
- nothing else is a lambda term.
For example, if we apply to , we find
Although the notation in Equation \eqref{eq:example} suggests multiplication, note that everything is function application, because really that’s all there is in lambda calculus.
Consider the factorial function :
In words, if is zero, return ; otherwise, multiply with . Equation \eqref{eq:fact-recursive} would be a valid expression if lambda calculus would allow us to use in the definition of . Unfortunately, it doesn’t. Tricky. Let’s replace the inner by a variable :
Now, crucially, the Y combinator is precisely designed to construct from :
To see this, let’s denote and verify that indeed equals :
\begin{align} \code{fact2} &= Y\, \code{fact}’ \\ &= (\lambda\, f : (\lambda\, x : f\,(x\, x))\, (\lambda\, x : f\,(x\, x)))\, \code{fact}’ \\ &= (\lambda\, x : \code{fact}’\,(x\, x) )\, (\lambda\, x : \code{fact}’\,(x\, x)) \label{eq:step-1} \\ &= \code{fact}’\, ((\lambda\, x : \code{fact}’\, (x\, x))\,(\lambda\, x : \code{fact}’\, (x\, x))) \label{eq:step-2} \\ &= \code{fact}’\, (Y\, \code{fact}’) \\ &= \code{fact}’\, \code{fact2}, \end{align}
which is exactly what we’re looking for, because the first argument to should be the actual factorial function, in this case. Neat!
We hence see that can indeed be used to define recursive functions in languages that don’t support recursion. Where does this magic come from, you say? Sit tight, because that’s up next!
Deriving the Y Combinator
This section introduces a simple trick that can be used to derive Equation \eqref{eq:Y-combinator}. We also show how this trick can be used to derive analogues of the Y combinator that implement mutual recursion in languages that don’t even support simple recursion.
Again, let’s start out by considering a recursive function:
where is some lambda term that depends on and . As we discussed before, such a definition is not allowed. However, pulling out ,
we do find that is a fixed point of : is invariant under applications of . Now—and this is the trick—suppose that is the result of a function applied to itself: . Then Equation \eqref{eq:fixed-point} becomes
from which we, by inspection, infer that
Therefore,
Pulling out ,
where suddenly a wild Y combinator has appeared.
The above derivation shows that is a fixed-point combinator. Passed some function , gives a fixed point of : satisfies .
Pushing it even further, consider two functions that depend on each other:
\begin{align} f &= \lambda\,x:k_f[x, f, g], & g &= \lambda\,x:k_g[x, f, g] \end{align}
where and are lambda terms that depend on , , and . This is foul play, as we know. We proceed as before and pull out and :
\begin{align} f = \underbrace{ (\lambda\,f’:\lambda\,g’:\lambda\,x:k_f[x, f’, g’]) }_{h_f} \,\, f\, g = h_f\, f\, g \end{align}
\begin{align}
g
= \underbrace{
(\lambda\,f’:\lambda\,g’:\lambda\,x:k_g[x, f’, g’])
}_{h_g} \,\, f\, g
= h_g\, f\, g.
\end{align}
Now—here’s that trick again—let and .1 Then
\begin{align} \hat{f}\,\hat{f}\,\hat{g} &= h_f\,(\hat{f}\,\hat{f}\,\hat{g})\,(\hat{g}\,\hat{f}\,\hat{g}) = (\lambda\,x:\lambda\,y:h_f\,(x\,x\,y)\,(y\,x\,y))\,\,\hat{f}\,\hat{g},\\ \hat{g}\,\hat{f}\,\hat{g} &= h_g\,(\hat{f}\,\hat{f}\,\hat{g})\,(\hat{g}\,\hat{f}\,\hat{g}) = (\lambda\,x:\lambda\,y:h_g\,(x\,x\,y)\,(y\,x\,y))\,\,\hat{f}\,\hat{g}, \end{align}
which suggests that
\begin{align} \hat{f} &= \lambda\,x:\lambda\,y:h_f\,(x\,x\,y)\,(y\,x\,y), \\ \hat{g} &= \lambda\,x:\lambda\,y:h_g\,(x\,x\,y)\,(y\,x\,y). \end{align}
Therefore
\begin{align} f &= \hat{f}\,\hat{f}\,\hat{g} \\ &= (\lambda\,x:\lambda\,y:h_f\,(x\,x\,y)\,(y\,x\,y))\, (\lambda\,x:\lambda\,y:h_f\,(x\,x\,y)\,(y\,x\,y))\, (\lambda\,x:\lambda\,y:h_g\,(x\,x\,y)\,(y\,x\,y)) \\ &= Y_f\, h_f\, h_g \end{align}
where
Similarly,
Dang, laborious, but that worked. And thus we have derived two analogues and of the Y combinator that implement mutual recursion in languages that don’t even support simple recursion.
Implementing the Y Combinator in Python
Well, that’s cool and all, but let’s see whether this Y combinator thing actually works. Consider the following nearly 1-to-1 translation of and to Python:
Y = lambda f: (lambda x: f(x(x)))(lambda x: f(x(x)))
fact = lambda f: lambda n: 1 if n == 0 else n * f(n - 1)
If we try to run this, we run into some weird recursion:
>>> Y(fact)(4)
RecursionError: maximum recursion depth exceeded
Eh? What’s going? Let’s, for closer inspection, once more write down :
After is passed to , is passed to ; which then evaluates , which passes to ; which then again evaluates , which again passes to ; ad infinitum. Written down differently, evaluation of yields
which goes on indefinitely.
Consequently, will not evaluate in finite time, and this is the cause of the RecursionError
.
But we can fix this, and quite simply so: only allow the recursion—the bit—to happen when it’s passed an argument; in other words, replace
Subsituting Equation \eqref{eq:strict-evaluation} in Equation \eqref{eq:Y-combinator}, we find
Translating to Python,
Y = lambda f: (lambda x: f(lambda y: x(x)(y)))(lambda x: f(lambda y: x(x)(y)))
And then we try again:
>>> Y(fact)(4)
24
>>> Y(fact)(3)
6
>>> Y(fact)(2)
2
>>> Y(fact)(1)
1
Sweet success!
Summary
To recapitulate, the Y combinator is a higher-order function that can be used to define recursion—and even mutual recursion—in languages that don’t support recursion. One way of deriving is to assume that the recursive function under consideration is the result of some other function applied to itself: ; after some simple manipulation, the result can then be determined by inspection. Although can indeed be used to define recursive functions, it cannot be applied literally in a contemporary programming language; recursion errors might then occur. Fortunately, this can be fixed simply by letting the recursion in happen when needed—that is, lazily.
-
Do you see why this is the appropriate generalisation of letting ? ↩