The problem is that 2400rps is such a tiny number. You can ddos yourself accidentally from a few browsers and a bug that makes it retry a request over and over; and the whole service will melt down in fun ways before you can isolate that.
The thing limiting you to that number also isn't just the startup cost. If it was, you could just run more things in paralell. The startup cost kills your minimum latency, but the rps limit comes from some other resource running out; cpu, memory, context switching, waiting for other services that are in themselves limited, etc. If it's cpu, which is very likely for python, any little performance regression can melt down the service.
Life is so much easier if you just get a somewhat performant base to build on. You can get away with being less clever, and you can see mistakes as a tolerable bump in resource usage or response times rather than a fail-whale
The thing limiting you to that number also isn't just the startup cost. If it was, you could just run more things in paralell. The startup cost kills your minimum latency, but the rps limit comes from some other resource running out; cpu, memory, context switching, waiting for other services that are in themselves limited, etc. If it's cpu, which is very likely for python, any little performance regression can melt down the service.
Life is so much easier if you just get a somewhat performant base to build on. You can get away with being less clever, and you can see mistakes as a tolerable bump in resource usage or response times rather than a fail-whale