Do transformer architecture and attention mechanisms actually give any benefit t... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		scotty79 on Nov 6, 2024 \| parent \| context \| favorite \| on: The deep learning boom caught almost everyone by s... Do transformer architecture and attention mechanisms actually give any benefit to anything else than scalability? I though the main insights were embeddings, positional encoding and shortcuts through layers to improve back propagation.

valzam on Nov 7, 2024 [–]

When it comes to ML there is no such distinction though. Bigger models == more capable models and for bigger models you need scalability of the algorithm. It's like asking if going to 2nm fabs has any benefit other than putting more transistors in a chip. It's the entire point.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact