Oh cool! But at the cost of twice the VRAM and only having 1/8th of the context,... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cjbprime on April 19, 2024 \| parent \| context \| favorite \| on: Meta Llama 3 Oh cool! But at the cost of twice the VRAM and only having 1/8th of the context, I suppose?

modeless on April 20, 2024 [–]

Llama 3 70B takes half the VRAM as Mixtral 8x22B. But it does need almost twice the FLOPS/bandwidth. Yes, Llama's context is smaller although that should be fixable in the near future. Another thing is that Llama is English-focused while Mixtral is more multilingual.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact