Yes, it's an early stage technology and the logistics of scaling is non-trivial. Yet, if you look at the numbers, they've been scaling surprisingly fast, at a sustained rate of 5x per year for 5 years in weekly paid rides.
Why? The press release is much more useful for the vast majority of HN readers in my opinion. The paper is something you read if you want to know more so the right place for it is the comments.
In general, not referring to this specific case, scientific papers are often written for people with specialized background and are hard to understand for people without that background, even if they're otherwise smart and educated.
Just to say, I actually disagree entirely. I do not believe press releases are, almost ever, valuable. Papers are just a format (with some writing style conventions that tend to follow the given field-of-study); they may be intimidating for many, but the hacker spirit and ethos is to dive in and tackle it, and that will pay far more dividends for everyone reading it than to consume more advertising. :)
Unless you are actually familiar with odontology or work in the field, the paper carries little significance for the average layperson (most of us on HN).
What are you working on specifically? I've been vaguely following poker research since Libratus, the last paper I've read is ReBeL, has there been any meaningful progress after that?
I was thinking about developing a 5-max poker agent that can play decently (not superhumanly), but it still seems like a kind of uncharted territory, there's Pluribus but limited to fixed stacks, very complex and very computationally demanding to train and I think also during gameplay.
I don't see why a LLM can't learn to play a mixed strategy. A LLM outputs a distribution over all tokens, which is then randomly sampled from.
Text trained LLM's are likely not a good solution for optimal play, just as in chess the position changes too much, there's too much exploration, and too much accuracy needed.
CFR is still the best, however, like chess, we need a network that can help evaluate the position. Unlike chess, the hard part isn't knowing a value; it's knowing what the current game position is. For that, we need something unique.
I'm pretty convinced that this is solvable. I've been working on rs-poker for quite a while. Right now we have a whole multi-handed arena implemented, and a multi-threaded counterfactual framework (multi-threaded, with no memory fragmentation, and good cache coherency)
With BERT and some clever sequence encoding we can create a powerful agent. If anyone is interested, my email is: elliott.neil.clark@gmail.com
I'm not working on game-related topics lately, I'm in the industry now (algo-trading) and also little bit out of touch.
> Has there been any meaningful progress after that?
There are attempts [0] at making the algorithms work for exponentially large beliefs (=ranges). In poker, these are constant-sized (players receive 2 cards in the beginning), which is not the case in most games. In many games you repeatedly draw cards from a deck and the number of histories/infosets grows exponentially.
But nothing works well for search yet, and it is still open problem. For just policy learning without search, RNAD [2] works okayish from what I heard, but it is finicky with hyperparameters to get it to converge.
Most of the research I saw is concerned about making regret minimization more efficient, most notably Predictive Regret Matching [1]
> I was thinking about developing a 5-max poker
Oh, sounds like lot of fun!
> I don't see why a LLM can't learn to play a mixed strategy. A LLM outputs a distribution over all tokens, which is then randomly sampled from.
I tend to agree, I wrote more in another comment. It's just not something an off-the-shelf LLM would do reliably today without lots of non-trivial modifications.
Waymo's self-driving cars are scaling quickly. With some inaccuracy it can be said that the problem is solved, we have the technology for a full-scale deployment, we just need to do the boring work to deploy it everywhere.
> Also, when you look at these cars and there’s no one driving, I actually think it’s a little bit deceiving because there are very elaborate teleoperation centers of people kind of in a loop with these cars. I don’t have the full extent of it, but there’s more human-in-the-loop than you might expect. There are people somewhere out there beaming in from the sky. I don’t know if they’re fully in the loop with the driving. Some of the time they are, but they’re certainly involved and there are people. In some sense, we haven’t actually removed the person, we’ve moved them to somewhere where you can’t see them.
A lot of things in science/technology have been invented essentially by accident though, with little to no understanding of why it worked. Who’s to say aging can’t be similar.