Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My mind went to Q learning.


My mind went to some kind of Q-learning combined with something like a Monte Carlo Tree Search with some kind of A*-style heuristic to effectively combine Q-learning and with short-horizon planning.


This was alpha-go and alpha-zero right?


likewise. i can already imagine a* being useful for efficiently solving basic algebra and proofs.

it could form the basis of a generalized planning engine and that planning engine could potentially be dangerous given the inherent competitive reasoning behind any minmax style approach.


Ok so maybe nothing to do with A*, but actually a way for GPT-powered models or agents to learn through automated reinforcement learning. Or something.

I wonder if DeepMind is working on something similar also.

If your hunch is right, this could lead to the type of self-improvement that scares people.


Could easily be both


Mine to Q of Star Trek.


But were you thinking of Q, Q, or Q?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: