As someone who implemented some RL algorithms and applied them to a real world game, (including all the ones mentioned in the article), I would be surprised if the implementation is not buggy. That is one of the most striking things about RL, the extent to which it is hard to find bugs, since they generally only degrade the performance instead of causing a crash or obviously wrong behavior. The fact that he doesn't mention a massive amount of time spent debugging, and the longish list of things that were tried that really should have worked but didn't, suggests to me it's probably still buggy. I suppose it is possible that LLMs could be particularly good at RL code since it's seen it repeated so many times... But I would be skeptical without hard evidence.
I accepted the bugginess in the browser game as unavoidable, and probably had too much faith in the LLM implementations, but I did a bit more troubleshooting than mentioned. The progressive improvement over episodes (and intuitively that PPO > the others) gave me some confidence, and I've since used a similar setup on 2048 with more results showing improvement over episode: https://wandb.ai/noahpunintended/2048-raspberry-pi?nw=nwuser...
Yeah essentially this. The irony of about wanting to obscure information by submitting it to a model API isn't lost on me, but it was the easiest way I could think of. Wanted some way of making the most key content in my picture to be the only thing unblurred
It caught my eye also but the article was interesting so I'll forgive OP :-)
On the topic of tamagotchi, if you happen to have a flipper zero there is emulator for it :-) my kid enjoyed it for while and it saved me a few bucks from having to buy one.
Blurring is never the solution as it can be unblurred in most cases (look up Mr whirlwind). Also Gemini sounds like overkill for the task of burring in general. Inkscape and gimp can do it for free (providing that you have a computer, not an iPad for example)
I had this idea during the pandemic 5 years ago now, and even did some of that work to figure out the variables I'd need to extract to make it work, but I never found the time/motivation to work on it for real. Really happy to see someone put in the effort.
The sample efficiency of the RL algorithm, even for simple games, is not very good. This usually means that we will need a lot of episodes for the policy to learn to excel. Being able to run policy in an environment that can parallel and accelerate could be very helpful for the improvement - for example running a batch of browsers or tabs simultaneously :)
Corporate firewall is blocking this since its a "newly registered domain" but I wanted to note that Tamagotchi got me to revisit Digimon after I learned that Digimon was created as a way for Bandai to sell to boys. Color me surprised when I learned that Tamagotchi was considered for girls, but I played with mine like there was no tomorrow, with the Pokemon hype of the late 90s it came to many of us at the right time.
Surprisingly, and I just looked it up, you can buy the original classic ones for about $20 straight off Amazon.