I recall another complication with websockets: IIRC it's with proxy load balancers, like binding a connection to a single connection server, even if the backend connection is using HTTP/2. I probably have the details wrong. I'm sure someone will correct my statement.
I think there is a way to do it, but it likely involves custom headers on the initial connection that the load balancer can read to route to the correct origin server.
I imagine the way it might go is that the client would first send an HTTP request to an endpoint that returns routing instructions, and then use that in the custom headers it sends when initiating the WebSocket connection.
I think it's more that WebSockets are held open for a long time, so if you're not careful, you can get "hot" backends with a lot of connections that you can't shift to a different instance.
It can also be harder to rotate backends since you know you are disrupting a large number of active clients.
The trick to doing this efficiently is to arrange for the live session state to be available (through replication or some data bus) at the alternative back end before cut over.
Another option is to have the client automatically reconnect. That way your backend can just drop the connection when it needs to, and the load balancing infra will make sure the reconnection ends up on a different server.
Of course you do want to make sure the client has exponential backoff and jitter when reconnecting, as to avoid thundering herd problems.
The relevant state will need to be available to all servers as well. Anything that's only known to the original server will be lost as it drops connections. On a modern deployment with a database and probably redis available this isn't too big of an ask.
Horizontal scaling is certainly a challenge. With traditional load balancers, you don't control which instance your clients get routed to, so you end up needing to use message brokers or stateful routing to ensure message broadcasts work correctly with multiple websocket server instances.