With 4B vectors, you can look at methods like quantization and compression, both...

With 4B vectors, you can look at methods like quantization and compression, both detailed here for Faiss - https://github.com/facebookresearch/faiss/wiki/Indexing-1G-v...

Elasticsearch uses HNSW, not sure what options they have but quantization/compression will help reduce disk storage requirements. Alternatively, you can look at dimensionality reduction algorithms and only store that output in ES. Or pick a model with a small number of dimensions. For example https://huggingface.co/sentence-transformers/all-MiniLM-L6-v... only has 384 dims vs 768/1024/2048/4096.