What the fuck does this even mean? How do you test or ensure it. Because based on the actual outcomes ChatGPT is 0-1 for preventing suicides (going as far as to outright encourage one).
If you're going to make the sample size one, and use the most egregious example, you make pretty much anything that has ever been born or built look terrible. Given there are millions of people using chat, GPT and others for therapy every week, maybe even everyday, citing a record of being 0-1 is pretty ridiculous.
To be clear, I'm not defending this particular case. Chat GPT clearly messed up bad.
What the fuck does this even mean? How do you test or ensure it. Because based on the actual outcomes ChatGPT is 0-1 for preventing suicides (going as far as to outright encourage one).