Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You don’t really feed images to LLMs, rather to a vision model within the multi modal llm


yup, important clarification! the language portion of the model also works with the extraction however, and is prone to the hallucinations




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: