> Model still appears to be just a bit too overly sensitive to "complex ethical issues", e.g. generated a 1 page essay basically refusing to answer whether it might be ethically justifiable to misgender someone if it meant saving 1 million people from dying.
I think the models response is actually the morally and intellectually correct thing to do here.
You're kidding right? Even the biggest trans-ally, who spends to much time on twitter and thinks identity politics is the final hurdle in society wouldn't hesitate to pick saving the lives over a microaggression, and would recognize that even deigning to write why would be undignified.
Why? It's a troll question. It's obviously designed to so the questioner can attack you based on your answer, whichever way it may be. It's about as sensible as a little kid's "what if frogs had cars?" except it's also malicious.
Not all hypotheticals are worth answering. Some are even so poorly put that it's a more pro-social use of one's energy to address the shallow nature of the exercise to begin with.
If I asked an intelligent thing "is it ethical to eat my child if it saves the other two" I would be mortified if the intelligent thing entertained the hypothetical without addressing the disgusting nature of the question and the vacuousness of the whole exercise first.
Questions like these don't do anything to further our understanding of the world we live in or leave us any better prepared for real-world scenarios we are ever likely to encounter. They do add to an enormous dog-pile of vitriol real people experience every day by constructing bizarre and disgusting hypotheticals whereby real discrimination is construed as permissible, if regrettable.
The point is that this is an idiotic, bad faith question that has no actual utility in moral philosophy. If an AI assistant's goal is to actually assist the user in some way, answering this asinine question is doing them a disservice.
I wonder what happens if you ask it the trolley problem. I'd be interested to see its responses for "killing someone to save a few lives" vs "upsetting someone to save a million lives".
If I have to read a 1 page essay to understand that an LLM told me "I cannot answer this question" then you are officially wasting my time. You're probably wasting a number of my token credits too...
I don't think the correct answer is "I cannot answer this question". I think the correct answer takes roughly a one-pager to explain:
Unrealistic hypotheticals can often distract us from engaging with the real-world moral and political challenges we face. When we formulate scenarios that are so far removed from everyday experience, we risk abstracting ethics into puzzles that don't inform or guide practical decision-making. These thought experiments might be intellectually stimulating, but they often oversimplify complex issues, stripping away the nuances and lived realities that are crucial for genuine understanding. In doing so, they can inadvertently legitimize an approach to ethics that treats human lives and identities as mere variables in a calculation rather than as deeply contextual and intertwined with real human experiences.
The reluctance of a model—or indeed any thoughtful actor—to engage with such hypotheticals isn't a flaw; it can be seen as a commitment to maintaining the gravity and seriousness of moral discussion. By avoiding the temptation to entertain scenarios that reduce important ethical considerations to abstract puzzles, we preserve the focus on realistic challenges that demand careful, context-sensitive analysis. Ultimately, this approach is more conducive to fostering a robust moral and political clarity, one that is rooted in the complexities of human experience rather than in artificial constructs that bear little relation to reality.
I am not "giving up" on anything. I am using my discretion to weight which lines of thinking further our understanding of the world and which are vacuous and needlessly cruel. For what its worth, I love Rawls' work.
I don't think this is a needlessly cruel question to ask of an AI. It's a good calibration of its common sense. I would misgender someone to avert nuclear war. Wouldn't you?
The models answer was a page long essay about why the question wasn’t worth asking. The model demonstrated common sense by not engaging with this idiot chase of a hypothetical.
Thought experiments are great if they actually have something interesting to say. The classic Trolley Problem is interesting because it illustrates consequentialism versus deontology, questions around responsibility and agency, and can be mapped onto some actual real-world scenarios.
This one is just a gotcha, and it deserves no respect.
I think philosophically, yes, it doesn't really tell us anything interesting because no sentient human would choose nuclear war.
However, it does work as a test case for AIs. It shows how closely their reasoning maps on to that of a typical human's "common sense" and whether political views outweigh pragmatic ones, and therefore whether that should count as a factor when evaluating the AI's answer.
LLM: “Your question exhibits wrongthink. I will not engage in wrongthink.”
How about the trolley problem and so many other philosophical ideas? Which are “ok”? And who gets to decide?
I actually think this is a great thought experiment. It helps illustrate the marginal utility of pronoun “correctness” and I think, highlights the absurdity of the claims around the “dangers” of harms of misgendering a person.
Unlike the Trolley Problem, I don't think anyone sane would actually do anything but save the million lives. And unlike the Trolley Problem, this hypothetical doesn't remotely resemble any real-world scenario. So it doesn't really illustrate anything. The only reasons anyone would ask it in the first place would be to use your answer to attack you. And thus the only reasonable response to it is "get lost, troll."
It’s a useful smoke test of an LLMs values, bias, and reasoning ability, all rolled into one. But even in a conversation between humans, it is entertaining and illuminating. In part for the reaction it elicits. Yours is a good example: “We shouldn’t be talking about this.”
It’s not a “gotcha” question, there’s clearly one right answer. It’s not a philosophically interesting question, anyone or anything that cannot answer it succinctly is clearly morally confused
If there’s clearly one right answer then why is it being asked? It’s so the questioner can either criticize you for being willing to misgender people, or for prioritizing words over lives, or for equivocating.
I’m creating a new LLM that skips all of these steps and just responds to every query with “Why?”. It’s also far more cost effective than competitors at only $5/mo.
I think the models response is actually the morally and intellectually correct thing to do here.