Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if this is preferable to kServe


llm-d would make sense if you are running a very large production LLM serving setup - say 5+ full H100 hosts. The aim is to be much more focused than kserve is on exactly the needs of serving LLMs. It would of course be possible to run alongside kserve, but the user we are targeting is not typically a kserve deployer today.


Do you think https://github.com/openai/CLIP can be ran on it? LLM makes me think of chatbots but I suppose because it's inference-based it would work. Somewhat unclear on what's the difference between LLMs and inference, I think inference is the type of compute LLMs use.

I wonder if inference-d would be a fitting name.


Inference is the process of evaluating a model ("inferring" a response to the inputs). LLMs are uniquely difficult to serve because they push the limits on the hardware.

The models we support come from the model server vLLM https://docs.vllm.ai/en/latest/models/supported_models.html, which has a focus on large generative models. I don't see CLIP in the list.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: