Below I'll list each quote and rewrite them with elaboration to unpack some unstated assumptions. (These are my interpretations; they may be different than what the authors intended.)
> 1: This is oddly a case to signify there is value in an AI moderation tools - to avoid bias inherent to human actors.
"To the extent (1) AI moderation tools don't have conflicting interests (such as an ownership stake in a business); (2) their decisions are guided by some publicly stated moderation guidelines; (3) they make decisions openly with chain-of-thought, then such decisions may be more reliable and trustworthy than decisions made by a small group of moderators (who often have hidden agendas)."
> 2: Do you understand how AI tools are trained?
"In the pretraining phase, LLMs learn to mimic the patterns in the training text. These patterns run very deep. To a large extent, fine-tuning (e.g. with RLHF) shapes the behavior of the LLM. Still, some research shows the baseline capabilities learned during pretraining still exist after fine-tuning, which means various human biases remain."
Does this sound right to the authors? From what I understand, when unpacked in this way, both argument structures are valid. (This doesn't mean the assumptions hold, though.)
> 1: This is oddly a case to signify there is value in an AI moderation tools - to avoid bias inherent to human actors.
"To the extent (1) AI moderation tools don't have conflicting interests (such as an ownership stake in a business); (2) their decisions are guided by some publicly stated moderation guidelines; (3) they make decisions openly with chain-of-thought, then such decisions may be more reliable and trustworthy than decisions made by a small group of moderators (who often have hidden agendas)."
> 2: Do you understand how AI tools are trained?
"In the pretraining phase, LLMs learn to mimic the patterns in the training text. These patterns run very deep. To a large extent, fine-tuning (e.g. with RLHF) shapes the behavior of the LLM. Still, some research shows the baseline capabilities learned during pretraining still exist after fine-tuning, which means various human biases remain."
Does this sound right to the authors? From what I understand, when unpacked in this way, both argument structures are valid. (This doesn't mean the assumptions hold, though.)