My mind went to some kind of Q-learning combined with something like a Monte Carlo Tree Search with some kind of A*-style heuristic to effectively combine Q-learning and with short-horizon planning.
likewise. i can already imagine a* being useful for efficiently solving basic algebra and proofs.
it could form the basis of a generalized planning engine and that planning engine could potentially be dangerous given the inherent competitive reasoning behind any minmax style approach.
Ok so maybe nothing to do with A*, but actually a way for GPT-powered models or agents to learn through automated reinforcement learning. Or something.
I wonder if DeepMind is working on something similar also.
If your hunch is right, this could lead to the type of self-improvement that scares people.