I have a different perspective. The Trifecta is a bad model because it makes people think this is just another cybersecurity challenge, solvable with careful engineering. But it's not.
It cannot be solved this way because it's a people problem - LLMs are like people, not like classical programs, and that's fundamental. That's what they're made to be, that's why they're useful. The problems we're discussing are variations of principal/agent problem, with LLM being the savant but extremely naive agent. There is no probable, verifiable solution here, not any more than when talking about human employees, contractors, friends.
This is what reasonable people disagree on. My employer provides several AI coding tools, none of which can communicate with the external internet. It completely removes the exfiltration risk. And people find these tools very useful.
Are you sure? Do they make use of e.g. internal documentation? Or CLI tools? Plenty of ways to have Internet access just one step removed. This would've been flagged by the trifecta thinking.
Yes. Internal documentation stored locally in Markdown format alongside code. CLI tools run in a sandbox, which restricts general internet access and also prevents direct production access.
>There is no probable, verifiable solution here, not any more than when talking about human employees, contractors, friends.
Well when talking about employees etc, one model to protect against malicious employees is to require every sensitive action (code check in, log access, prod modification) to require approval from a 2nd person. That same model can be used for agents. However, agents, known to be naive, might not be a good approver. So having a human approve everything the agent does could be a good solution.
It cannot be solved this way because it's a people problem - LLMs are like people, not like classical programs, and that's fundamental. That's what they're made to be, that's why they're useful. The problems we're discussing are variations of principal/agent problem, with LLM being the savant but extremely naive agent. There is no probable, verifiable solution here, not any more than when talking about human employees, contractors, friends.