I use twitter daily and the site is a shell of its former self. It's slow, prone to bugs, filled with bots, the amount of real users has cratered, user reports go nowhere, there's no support team, the ads are now bot accounts posting crap like "Today is a good day, be sure to make advantageous", there are no new features besides previously in-flight projects pre-Musk, they've actually removed a lot of features (like Circles, block lists, etc), and much more. He took an otherwise functioning social media service and forced it into maintenance mode. He also fired all of the people that keep the user base alive so now it's flooded with bots (which he presumably likes so he can boast about engagement being up). So yes it's still around but it's dying and the skeleton crew he has left can't do anything.
In other words, he destroyed it.
Keeping a website afloat is far less capital intensive than running a car maker. If you are building cars, you have to have a continuous massive amount of investment into development. For running a social media website you just need competent staff and cover hardware costs.
You can free-float X to a certain extent, just make sure neither users nor advertisers run away. If you are building cars and a model completely falls flat you are easily down billions of dollars.
>We made several new observations on scaling behavior during the development of Llama 3. For example, while the Chinchilla-optimal amount of training compute for an 8B parameter model corresponds to ~200B tokens, we found that model performance continues to improve even after the model is trained on two orders of magnitude more data. Both our 8B and 70B parameter models continued to improve log-linearly after we trained them on up to 15T tokens. Larger models can match the performance of these smaller models with less training compute, but smaller models are generally preferred because they are much more efficient during inference.
Can someone experienced please explain this. Does this mean, a lean model with more training time and/or more (or better) training data will perform better than a fat model?
Yes. Llama 3 8B outperforms Llama 2 70B (in the instruct-tuned variants).
"Chinchilla-optimal" is about choosing model size and/or dataset size to maximize the accuracy of your model under a fixed training budget (fixed number of floating point operations). For a given dataset size it will tell you the model size to use, and vice versa, again under the assumption of a fixed training budget.
However, what people have realized is that inference compute matters at least as much as training compute. You want to optimize training and inference cost together, not in isolation. Training a smaller model means your accuracy will not be as good as it could have been with a larger model using the same training budget, however you'll more than make it up in your inference budget. So in most real world cases it doesn't make sense to be "Chinchilla-optimal".
What Meta is saying here is that there is no accuracy ceiling. You can keep increasing training budget and dataset size to increase accuracy seemingly indefinitely (with diminishing returns). At least as far as they have explored.
What's interesting about the minimization of combined training + (model lifetime) inference cost is that that is going to look different for different companies, depending on what their inference volume is...
Meta have a massive user base, and if they are using these models to run their own business, then that implies massive inference volume, and that it might make economic sense for them to put more money into training (to make smaller/cheaper models more powerful) than for other companies with lower inference volume.
To put it another way, it'd not be surprising - if their internal use of these models is very high - to see Meta continuing to release models that size for size beat the competition since they were incentivized to pump more tokens through them during training.
Somewhere I read that the 8B llama2 model could be undertrained by 100-1000x. So is it possible to train a model with 8B/100 = 80M parameters to perform as good as the llama2 8B model, given enough training time and training tokens?
It's unclear. It might take a larger dataset than actually exists, or more compute than is practical. Or there may be a limit that we just haven't reached yet; this actually seems quite likely. The scaling "laws" are really more like guidelines and they are likely wrong when extrapolated too far.
They're saying with this architecture there's a tradeoff between training and inference cost where a 10x smaller model (much cheaper to run inference) can match a bigger model if the smaller is trained on 100x data (much more expensive to train) and that the improvement continues log-linearly.
Is this generating videos as streaming content e.g. like a mp4 video. As far as I can see, it is doing that. Is it possible for AI to actually produce the 3d models?
What kind of compute resources are required to produce the 3d models.
Lets say that the owner of the capital wants to fund hardworking individuals like "Shuji Nakamura". At the same time, most individuals seeking capital for research are just pretending to work hard, how does capital-owner identify the true warriors.
About a decade ago, when facebook(after IPOing at 38) came down to 20, I was in a bus speaking with a friend about buying a bit and holding it forever(like Buffet). He laughed at me. A lot of folks in the bus gave me strange looks.
So far it has been the best decision(investment wise) for me.
The only effort I did was to read their IPO prospectus and the latest 10q(at that time).
In 2011 after selling my company to Goog a salesperson from Goldman called me trying to sell me pre IPO Facebook stock. I told him I think FB is s fad, and given that he works for Goldman, they wouldn't sell it if they thought it had value.
Stop listening to anything else other than your own. Just read the 10Q. After cutting the fat, its about 8-10 pages, every 3 months. Lets say you follow 20 companies. That about the size of one book every three months. Not an entertaining read, but definitely enriching one.
The 10-q is there in the SEC website[1]. The summery is a quick read[2]. Just read it like you would read a textbook. Read the same for APPL, GOOG, AMZN. Do the same for two other sectors. In total about 20 companies. Very quickly you see the patterns emerge.
In the context META, here is a quick summary
Besides the sequential increases of MAU, DAU, DAP, MAP, One thing that stands out to me is:
"We anticipate our full-year 2024 capital expenditures will be in the range of $30-35 billion, with growth driven by
investments in servers, including both non-artificial intelligence (AI) and AI hardware, and data centers as we ramp up
construction on sites with the new data center architecture we announced late last year."
Thats a lot of investment mostly in NVDA hardware. Expect NVDA to rise. This also means good for other suppliers, AMD, INTC, TSMC. The benefit for META will be apparent in a few quarters.
Thank you for this info and including links. Looking through the full 10Q[1] I don't see the summary[2] or link to it. How do you go about finding summaries for other 10Qs from other companies? I was trying to work backwards to find the summary from the full 10Q so I could do the same for other companies.