There is an old saying about the future being here but unevenly distributed. The pods look nice and would be even nicer with some kind of daily meal kit for a few extra dollars.
I mean you're not wrong, do people not know how these drugs work? They're not magic, they make you never want to eat and take away the "feeling like shit" when you don't.
That's not the experience my wife has had. She's been on one of these drugs for a while and still gets insanely hungry, but she's able to feel "full" with a much smaller meal than before taking the drug. If she overeats, she feels like shit.
That's a little too sci-fi for me but I am sure many youngsters with higher risk tolerance would be happy to pay a subscription fee for more streamlined nutrition delivery systems.
Well done. I'd change a few things to make things technically more precise. In a few places you use words like "learnable" parameter but I think this tends to confuse people more than help them understand what is going on. People can learn but parameters can only be modified according to some rule that minimizes or maximizes some objective function of those parameters. People who understand the technical details tend to use words/phrases like "learning" as shorthand but in an introductory post like this it is useful to be technically precise and not use anthropomorphisms that can confuse beginners.
I wonder how long it will go before it devolves into complete incoherence. It already seems incoherent so probably in a few updates it will be completely unreadable.
The value proposition of Cerebras is that they can compile existing graphs to their hardware and allow inference at lower costs and higher efficiencies. The title does not say anything about creating or optimizing new architectures from scratch.
That's correct and if you read the whole thing you will realize that it is followed by "... to leap over GPUs" which indicates that they're not literally referring to optimizing the weights of the graph on a new architecture or freshly initialized variables on existing ones.
Trains has no other sensible interpretation in the context of LLM models. My impression was that they trained the models to be better than the models trained by GPUs, presumably because they trained faster and managed to train for longer than Meta, but this interpretation was far from the content.
Also interesting to see the ommission of deepinfra from the price table, presumably because it would be cheaper than Cerebras, though I didnt even bother to check at that point because I hate these cheap clickbaity pieces that attempt to enrich some player at the cost of everyone’s time or money.
Good luck with their IPO. We need competition but we dont need confusion.
What are you confused about? Their value proposition is very simple and obvious, custom hardware with a compiler that transforms existing graphs into a format that can run at lower cost and higher efficiency because it utilizes a special instruction set only available on Cerebras silicon.
The title is clickbait but that's how marketing works whether we like it or not. The achievement is real - Cerberas improved their software and the inference is much faster as a result. I find it easy to forgive annoying marketing tactics when they're being used to promote something cool.
It is textbook bait and switch. If the achievemt is important, use the correct title. An advance in actual training performance or a better model is very important and interests a different set of people with deeper pockets than those who care about inference.
On May 3, 2021 I wrote a note to myself about the type of people OpenAI was hiring and this was the note: looks like OpenAI is getting into the military business by hiring a former CIA clandestine operator Will Hurd https://en.wikipedia.org/wiki/Will_Hurd. Seems like I was right but this should be expected because every corporation is in one way or another linked to the military industrial complex.
In the most general case there is no technique that can determine if two programs are equivalent other than running both programs on some set of inputs and verifying that the outputs (after termination) are the same. Every other technique must cut out all possible sources of non-termination to get around the halting problem in order to make the resulting equivalence relation on the set of programs effectively computable and constructively provable.
1: https://www.nature.com/articles/s41598-023-36013-7
2: https://pmc.ncbi.nlm.nih.gov/articles/PMC10202899/