very few humans are as good as these models at arithmetic. and CoT is not "mostly fake" that's not a correct interpretation of that research. It can be deceptive but so can human justifications of actions.
Humans can learn the symbolic rules and then apply them correctly to any problem, bounded only by time, and modulo lapses of concentration. LLMs fundamentally do not work this way, which is a major shortcoming.
They can convincingly mimic human thought but the illusion falls flat at further inspection.