ELI5: Tail-calls?

Jtsummers · on July 12, 2022

A tail call is a function call occurring in the final position of a function:

  void foo(...) {
    ...
    bar(x,y,z); // <= a tail call
  }

If the function has a return value (vice void like above):

  int foo(...) {
    ...
    return bar(x,y,z); // <= still a tail call
  }

In the way most languages are compiled, function calls generate a new entry in the call stack (a stack frame). This is necessary for all non-tail calls in order to handle the bookkeeping around what to do when the call finishes, how does the caller resume.

With tail calls, that additional stack frame has no real value (outside, maybe, debugging information to give you a stack trace but traces can be collected other ways). Tail call elimination (or tail call optimization) will reuse the current stack frame rather than construct a new one. This reduces the memory overhead (you aren't constructing unnecessary stack frames) and gives some performance improvement (less bookkeeping overhead, and the function call becomes a simple jump). These two functions can, in principle, get compiled to the same thing if you have TCE:

  uint factorial(uint n) {
    uint acc = 1;
    for(; n > 0; n--) {
      acc *= n;
    }
    return acc;
  }
  uint factorial(uint n, uint acc = 1) {
    if (n == 0) return acc;
    return factorial(n - 1, acc * n);
  }

But while that's a recursive example, tail calls and tail call elimination (TCE) aren't just for recursive cases. My first two examples, though they are just sketches, show a non-recursive example of tail calls. With full TCE (it isn't uncommon to have TCE only apply to self-recursive cases) those examples would also have tail call elimination performed.

ufo · on July 12, 2022

A contrived example: https://godbolt.org/z/17T5MzvGY

The compiler is able to optimize the function calls in the return statement. Note how the calls are compiled into a jmp instruction, instead of a call instruction. This means that it doesn't need a new stack frame for each call. Even if the "x" is very big, it won't blow the stack.

The most basic application of tail call optimization is when you optimize a recursive function. It essentially turns the recursion into a loop. In fact, is how you write loops in certain functional languages. But TCO is not jut for that. It can also be used when one function calls a different function, like in the first example I linked. In this case the gotos can model any state machine, not just a self loop.

qsort · on July 12, 2022

https://www.youtube.com/watch?v=-PX0BV9hGZY