Even on Linux with overcommit you can have allocations fail, in practical scenarios.
You can impose limits per process/cgroup. In server environments it doesn't make sense to run off swap (the perf hit can be so large that everything times out and it's indistinguishable from being offline), so you can set limits proportional to physical RAM, and see processes OOM before the whole system needs to resort to OOMKiller.
Processes that don't fork and don't do clever things with virtual mem don't overcommit much, and large-enough allocations can fail for real, at page mapping time, not when faulting.
Additionally, soft limits like https://lib.rs/cap make it possible to reliably observe OOM in Rust on every OS. This is very useful for limiting memory usage of a process before it becomes a system-wide problem, and a good extra defense in case some unreasonably large allocation sneaks past application-specific limits.
These "impossible" things happen regularly in the services I worked on. The hardest part about handling them has been Rust's libstd sabotaging it and giving up before even trying. Handling of OOM works well enough to be useful where Rust's libstd doesn't get in the way.
I hear this claim on swap all the time, and honestly it doesn't sound convincing. Maybe ten or twenty years ago, but today? CAS latency for DIMM has been going UP, and so is NVMe bandwidth. Depending on memory access patterns, and whether it fits in the NVMe controller's cache (the recent Samsung 9100 model includes 4 GB of DDR4 for cache and prefetch) your application may work just fine.
Swap can be fine on desktops where usage patterns vary a lot, and there are a bunch of idle apps to swap out. It might be fine on a server with light loads or a memory leak that just gets written out somewhere.
What I had in mind was servers scaled to run near maximum capacity of the hardware. When the load exceeds what the server can handle in RAM and starts shoving requests' working memory into swap, you typically won't get higher throughput to catch up with the overload. Swap, even if "fast enough", will slow down your overall throughput when you need it to go faster. This will make requests pile up even more, making more of them go into swap. Even if it doesn't cause a death spiral, it's not an economical way to run servers.
What you really need to do is shed the load before it overwhelms the server, so that each box runs at its maximum throughput, and extra traffic is load-balanced elsewhere, or rejected, or at least queued in some more deliberate and efficient fashion, rather than franticly moving server's working memory back and forth from disk.
You can do this scaling without OOM handling if you have other ways of ensuring limited memory usage or leaving enough headroom for spikes, but OOM handling lets you fly closer to the sun, especially when the RAM cost of requests can be very uneven.
It's almost never the case that memory is uniformly accessed, except for highly artificial loads such as doing inference on a large ML model. If you can stash the "cold" parts of your RAM working set into swap, that's a win and lets you serve more requests out of the same hardware compared to working with no swap. Of course there will always be a load that exceeds what the hardware can provide, but that's true regardless of how much swap you use.
Swap isn't just for when you run out of ram though.
Don't look at swap as more memory on slow / hdds. Look at it as a place the kernel can use if it needs a place to put something temporarily.
This can happen on large memory systems fairly easily when memory gets fragments and something asks for a chunk of memory than can't be allocated because there isn't a large enough contiguous block, so the allocation fails.
I always do a least a couple of GBs now for swap... I won't really miss the storage and that at least gives the kernel a place to re-org/compact memory and keep chugging along.
You can impose limits per process/cgroup. In server environments it doesn't make sense to run off swap (the perf hit can be so large that everything times out and it's indistinguishable from being offline), so you can set limits proportional to physical RAM, and see processes OOM before the whole system needs to resort to OOMKiller. Processes that don't fork and don't do clever things with virtual mem don't overcommit much, and large-enough allocations can fail for real, at page mapping time, not when faulting.
Additionally, soft limits like https://lib.rs/cap make it possible to reliably observe OOM in Rust on every OS. This is very useful for limiting memory usage of a process before it becomes a system-wide problem, and a good extra defense in case some unreasonably large allocation sneaks past application-specific limits.
These "impossible" things happen regularly in the services I worked on. The hardest part about handling them has been Rust's libstd sabotaging it and giving up before even trying. Handling of OOM works well enough to be useful where Rust's libstd doesn't get in the way.
Rust is the problem here.