(1) Thank you for implementing this in JSC!! I hope they take it, it makes it into Safari, and the tail-call proposal advances.
(2) I don't think you are exactly right about the call stack being observable via thrown exceptions. There's no formal spec for the v3 exceptions proposal yet, but in the documents and tests, there's nothing that would change in WebAssembly core to make the call stack observable. (There's no ".stack property" that would be added to Wasm itself.) It's true that the proposal amends the JS API (but only the JS API) to describe a traceStack=true option; from Wasm's perspective I understand that's just an ordinary exception that happens to include an externref value (just like any other value) to which Wasm attaches no special significance. The Web-based engines can attach an informative stack trace if they want, but there's no requirement preventing frames from having been optimized out. The non-Web engines won't have to think about this.
(3) I think the real reason that a Wasm engine can't implicitly make tail calls proper is that the spec tests forbid it, basically because they didn't want the implementation base to fragment by having some engines perform an optimization that changes the space complexity of a program, which some programs would have started to depend on. (The spec tests say: "Implementations are required to have every call consume some abstract resource towards exhausting some abstract finite limit, such that infinitely recursive test cases reliably trap in finite time. This is because otherwise applications could come to depend on it on those implementations and be incompatible with implementations that don't do it (or don't do it under the same circumstances.)")
But the issue is much weaker than "call stack is observable" -- it's more like "infinite recursion must trap eventually, but it can be nondeterministic when."
> Implementations are required to have every call consume some abstract resource towards exhausting some abstract finite limit
If an implementation really wanted to, they could get around this by incrementing a "function call counter" that traps at, say, 2^64, rendering it effectively moot.
I feel like these kinds of situations are where being practical might make more sense than being mathematically precise. Something like "each function call must consume at least N bits of memory" or something concrete like that. Or heck, "implementations may not perform tail-call optimization" or even "implementations must be able to reconstruct the full logical call stack at any point".
You don't want it to be implementation-specific whether tail call optimization is performed or not.
The nightmare scenario is that you spend months writing a program that works great in browsers A and B, but when you roll it out to prod it immediately fails in browser C, which doesn't perform TCO.
You only want to write code that depends on TCO if you can get a guarantee that TCO will be performed. This is the motivation for the clang:musttail attribute in C/C++ (I implemented musttail in Clang): https://clang.llvm.org/docs/AttributeReference.html#musttail
This might be a silly question, but why is that a nightmare scenario? The exact same argument could be made for any compile-time optimization. If one compiler inlines a function where another doesn't, or unrolls a loop, or performs better constant propagation, any of those could impact the resource usage between the two compilers, leading to exactly that same scenario. But I wouldn't want to forbid improvements altogether out of a desire for consistency.
To me, I'd see a distinction between code that requires a specific optimization and code that could benefit from that optimization. If the standard forbids an optimization to avoid the former, it also removes the latter.
That is exactly why tail call "optimisation" is not, in fact, an optimisation.
Indeed much of this discussion, including the title and text of the original post, carefully avoids using the word "optimisation". Somehow it got introduced at some point in these comments.
To flip it the other way: consider a normal (non-tail-call) program and choose a local variable in a routine that happens to be called quite a lot. Replace it with a list that you append to every time you call the routine, even though only the last entry is ever examined. The program will leak memory, potentially quite quickly, and eventually crash. Is it fair to say it's just less optimised than before? I would say it's worse than that: it has an actual bug.
That's exactly the situation in a program created with the assumption of tail calls that is then used in an environment without them.
> Replace it with a list that you append to every time you call the routine, even though only the last entry is ever examined. The program will leak memory, potentially quite quickly, and eventually crash.
I had to think on this example a bit, but I believe these programs exist and are quite common. This is an accurate description of any program in which memory is allocated and never explicitly freed. Some languages include runtimes that perform garbage collection, but until the garbage collection runs, the program is leaking memory and would crash without the garbage collection.
Which is to say, I think I agree with you, but had to chew over the example a bit. That it isn't a quantitative difference in speed or memory usage, but a qualitative difference in the types of programs that are acceptable in a language. Thank you for it.
> Is it fair to say it's just less optimised than before? I would say it's worse than that: it has an actual bug.
Yes, a program that depends on tail call optimization, despite having no guarantee of tail call optimization being performed, has a bug.
That doesn't mean that tail call optimization is not an optimization. In languages that allow, but do not guarantee, tail call elimination, it is an optimization performed by the optimizer that makes a program run more optimally than it would have otherwise. There is a long history of referring to it as an optimization, for example in the docs of LLVM: https://llvm.org/docs/LangRef.html#call-instruction
I don't think it makes sense to redefine the word "optimization" such that it excludes examples where a program could erroneously depend on a certain optimization being performed.
OK, if you prefer to be precise, there are two situations:
(1) In languages (or runtimes) where tail calls are not guaranteed to be elided, tail call elision is an optimisation. In that context, you could quite reasonably describe it as tail call optimisation. Programs that are written in these languages and rely on tail call elision are buggy.
(2) In languages (or runtimes) where tail calls are guaranteed to be elided, tail call elision is not an optimisation. In that context, you cannot reasonably describe it as tail call optimisation. Programs that are written in these languages and rely on tail call elision are not buggy (at least, not for that reason!).
This conversation is about making tail call elision a guaranteed feature of WASM. Therefore, in this context, it is not reasonable to describe tail call elision as an optimisation.
Tail calls can be optimized at the source code level too. When I first read about them decades ago I thought it was a stupid notion due to me not understanding how a good compiler can optimize them away. Then I wondered why certain people felt the need to write code that way. I still wonder that.
Thanks for that. Having written a threaded interpreter before that immediately made sense. It does seem like you used some other hints to the compiler as well so this would fall under "experts only" type of code and that's OK. I still wonder how well good old "goto" might do, but once you move away from a huge switch/case to individual functions I can see how it might help a bit.
I still don't understand why some people seem to like the idea of iterating simpler things with recursion using a tail call.
> The exact same argument could be made for any compile-time optimization.
No.
For other optimization, the difference is the speed of execution. For tail call optimization, the difference can be normal execution and stack overflow.
Any optimization that removes an unnecessary stack allocation could also be the difference between normal execution and stack overflow, due to decreasing the size of the stack frame.
The reason for the existing mandate is to be a problem for recursive programs, so it is a feature, not a bug, of the proposed refinement that it more reliably achieves that.
(Specifically, to prevent one implementation getting an optimization for them that leads people to rely on it, making code that blows up in practical use on other implementations, despite both being nominally correct implementations of the same spec.)
> The spec tests say: "Implementations are required to have every call consume some abstract resource
It's a nice detail for you to highlight, but surely that exists in the test spec specifically because the opcode spec doesn't mandate tail-call handling. If the latter did (a la Scheme), then the test spec would be updated to no longer have this note...?
The proposed spec is to have an explicit tail call instruction separate from the regular call instruction for which resources must be used.
I think this would be the correct design even if tail calls were desired from the start. It makes for a better debugging experience and allows language implementations more control. (A trend in some functional languages is to require explicit annotation when tail recursion is desired, both to improve debugging for regular functions and to allow the compiler to check that you don’t accidentally stop being tail-recursive)
(1) Thank you for implementing this in JSC!! I hope they take it, it makes it into Safari, and the tail-call proposal advances.
(2) I don't think you are exactly right about the call stack being observable via thrown exceptions. There's no formal spec for the v3 exceptions proposal yet, but in the documents and tests, there's nothing that would change in WebAssembly core to make the call stack observable. (There's no ".stack property" that would be added to Wasm itself.) It's true that the proposal amends the JS API (but only the JS API) to describe a traceStack=true option; from Wasm's perspective I understand that's just an ordinary exception that happens to include an externref value (just like any other value) to which Wasm attaches no special significance. The Web-based engines can attach an informative stack trace if they want, but there's no requirement preventing frames from having been optimized out. The non-Web engines won't have to think about this.
(3) I think the real reason that a Wasm engine can't implicitly make tail calls proper is that the spec tests forbid it, basically because they didn't want the implementation base to fragment by having some engines perform an optimization that changes the space complexity of a program, which some programs would have started to depend on. (The spec tests say: "Implementations are required to have every call consume some abstract resource towards exhausting some abstract finite limit, such that infinitely recursive test cases reliably trap in finite time. This is because otherwise applications could come to depend on it on those implementations and be incompatible with implementations that don't do it (or don't do it under the same circumstances.)")
But the issue is much weaker than "call stack is observable" -- it's more like "infinite recursion must trap eventually, but it can be nondeterministic when."
More discussion here: https://github.com/WebAssembly/spec/issues/150