I read that as "the tools (their capabilities) are external to the model". Even ...

I read that as "the tools (their capabilities) are external to the model".

Even if an RAG / agentic model learns from tool results, that doesn't automatically internalize the tool. You can't get yesterday's weather or major recent events from an offline, unless it was updated in that time.

I am often wondering whether this is how large Chat and cloud AI providers cache expensive RAG-related data though :) like, decreasing the likelihood of tool usage given certain input patterns when the model has been patched using some recent, vetted interactions – in case that's even possible?

Perplexity for example seems like they're probably invested in sone kind of activation-pattern-keyed caching... at least that was my first impression back when I first used it. Felt like decision trees, a bit like Akinator back in the days, but supercharged by LLM NLP.