You don’t really feed images to LLMs, rather to a vision model within the multi ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		m3kw9 on Feb 7, 2025 \| parent \| context \| favorite \| on: Why LLMs still have problems with OCR You don’t really feed images to LLMs, rather to a vision model within the multi modal llm

ritvikpandey21 on Feb 7, 2025 [–]

yup, important clarification! the language portion of the model also works with the extraction however, and is prone to the hallucinations

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact