Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh cool! But at the cost of twice the VRAM and only having 1/8th of the context, I suppose?


Llama 3 70B takes half the VRAM as Mixtral 8x22B. But it does need almost twice the FLOPS/bandwidth. Yes, Llama's context is smaller although that should be fixable in the near future. Another thing is that Llama is English-focused while Mixtral is more multilingual.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: