(You can't compare parameter count with a mixture of experts model, which is wha... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cjbprime on April 18, 2024 \| parent \| context \| favorite \| on: Meta Llama 3 (You can't compare parameter count with a mixture of experts model, which is what the 1.8T rumor says that GPT-4 is.)

schleck8 on April 18, 2024 [–]

You absolutely can since it has a size advantage either way. MoE means the expert model performs better BECAUSE of the overall model size.

cjbprime on April 18, 2024 | [–]

Fair enough, although it means we don't know whether a 1.8T MoE GPT-4 will have a "size advantage" over Llama 3 400B.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact