the kind of generalized tail call I'm talking about is generally referred to as a continuation. its easiest to think of it as a jump. so you can imagine that if we have jump and a stack register, we can push+jump and thats the same as 'call'. we can pop+jump and thats the same as 'return'. since 'call' and 'return' have side effects, we cant really use them to replace jump.
so thats really straightforward, but things do get a bit screwy. if we can support arbitrary jumps, and the frame storage (arguments + locals) cant necessarily go on a stack because our frames might have arbitrary liftimes, so we cant reclaim them in stack order.
so in a scheme for example we just pull in a gc and track references to these closures and dust our hands off with a smirk. if that doesn't work for you then you have to adopt some kind of framework that lets you know explicitly when these can be released. reference counting is a poor choice here because these references between frames can easily be cyclic.