the kind of generalized tail call I'm talking about is generally referred to as a continuation. its easiest to think of it as a jump. so you can imagine that if we have jump and a stack register, we can push+jump and thats the same as 'call'. we can pop+jump and thats the same as 'return'. since 'call' and 'return' have side effects, we cant really use them to replace jump.
so thats really straightforward, but things do get a bit screwy. if we can support arbitrary jumps, and the frame storage (arguments + locals) cant necessarily go on a stack because our frames might have arbitrary liftimes, so we cant reclaim them in stack order.
so in a scheme for example we just pull in a gc and track references to these closures and dust our hands off with a smirk. if that doesn't work for you then you have to adopt some kind of framework that lets you know explicitly when these can be released. reference counting is a poor choice here because these references between frames can easily be cyclic.
And yet Clojure doesn’t have general tail calls (it has recur which allows something like tail calls so long as they are non-mutually recursive) and seems to do ok.