I'm seeing this reaction a lot from younger people (say, roughly under 25). And it's a shame this new suspicion has now translated into a prohibition on the use of dashes.
It's utterly uncommon in the kind of casual writing for which people are using AI, that's why it got noticed. Social media posts, blogs, ...
AI almost certainly picked it up mainly from typeset documents, like PDF papers.
It's also possible that some models have a tokenizing rule for recognizing faked-out em-dashes made of hyphens and turning them into real em-dash tokens.
On my own (long abandoned) blog, about 20% of (public) posts seem to contain an em dash: https://shreevatsa.wordpress.com/?s=%E2%80%94 (going by 4 pages of search results for the em dash vs 21 pages in total).
Ironically, I love using em dashes in my writing, but if I ever have to AI generate an email or summary or something, I will remove it for this exact reason.
That's simply not true, and pointlessly derogatory.
This article does not appear to be AI-written, but use of the emdash is undeniably correlated with AI writing. Your reasoning would only make sense if the emdash existed on keyboards. It's reasonable for even good writers to not know how or not care to do the extra keystrokes to type an emdash when they're just writing a blog post - that doesn't mean they have bad writing skills or don't understand grammar, as you have implied.
> That's simply not true, and pointlessly derogatory.
That same critique should first be aimed at the topmost comment, which has the same problem plus the added guilt of originating (A) a false dichotomy and (B) the derogatory tone that naturally colors later replies.
> It's reasonable for even good writers to not know how or not care
The text is true, but in context there's an implied fallacy: If X is "reasonable", it does not follow that Not-X is unreasonable.
More than enough (reasonable) real humans do add em-dashes when they write. When it comes to a long-form blog post—like this one submitted to HN—it's even more likely than usual!
> the extra keystrokes
Such as alt + numpad 0150 on Windows, which has served me well when on that platform for... gosh, decades now.
That's an en dash, not an em dash. An em dash is longer and as far as I know Libreoffice doesn't have a built-in way to make one (though you may have added it to the autocorrect settings yourself).
Thanks! I did confuse the two despite knowing of both.
So ":---:" does work for the em dash? I thought something with fewer keystrokes work, too, at least I remember the em dash from less, but perhaps I just typed it so quickly I did not realize it was indeed ":---:".
I don't think the character is that uncommon in the output of slightly-sophisticated writers and is not hard to generate (e.g., on macOS pressing option-shift-minus generates an em-dash).
In fact, on macOS and iOS simply typing two dashes (--) gets autocorrected to an em dash. I used it heavily, which was a bit sloppy since it doesn't also insert the customary hair spaces around the em dash.
Incidentally, I turned this autocorrection off when people started associating em dashes with AI writing. I now leave them manual double dashes--even less correct than before, but at least people are more likely to read my writing.
That's a silly take, just because they existed and were proper grammar before AI slop popularized them doesn't mean they're not statistically likely to indicate slop today, depending on the context.
What's sillier is people associating em-dashes with AI slop specifically because they are unsophisticated enough never to have learned how to use them as part of their writing, and assuming everyone else must be as poor of a writer as they are.
It's the literary equivalent of thinking someone must be a "hacker" because they have a Bash terminal open.
It doesn't really matter. Before LLM's, they were relatively rarely seen, after LLM's, they are commonly seen in AI-written text. Its not unreasonable for people to associate them with being AI-written.
They weren't "relatively rarely seen". If you have seen a Word document, chances are good that it had em-dashes in it simply because it would often autocorrect to that. In the Apple ecosystem, this sort of autocorrect is provided by the OS itself, so it extends to a lot more content produced.
I'm pretty sure that all the comments about how it was "rarely seen" are because people weren't paying attention to them before in the way they do now.
In any case, to dismiss something as AI slap based solely on this one thing is both lazy and rude, and should be treated as such.
Its how they're used though, not just that they are used. For example, hyphenating words is common enough, but what wasn't all that common prior to LLM's is people using them to combine sentences -- like this.
In general while reading blogs, Reddit, HN, youtube comments, X posts and so forth, I rarely never saw run-on sentences like that combined with em-dashes. Sure, they existed, but it was pretty uncommon up until about two years ago and now I see them all the time. So anecdotally, there is absolutely a shift in usage right around the time that LLM's took off. I wouldn't judge a professional article or book by its use of em-dashes, but I absolutely judge user generated text on the web.
Its not the only thing I judge on, though. Its just one of a number of red flags. Frequent uses of lists (of usually three items) and of bullet points too -- and run on sentences, not just the em-dashes. Yes, I did that on purpose.
You're overthinking it. LLMs exploded the prevalence of em-dashes. That doesn't mean you should assume any instance of an em-dash means LLM content, but it's a reasonable heuristic at the moment.
I dunno, I feel like the base rate fallacy [0] could easily become a factor... Especially if we don't even have an idea what the false-positive or false-negative rates are yet, let alone true prevalence.
> That doesn't mean you should assume any instance of an em-dash means LLM content
No, it doesn't. But people are putting that out there, people are getting accused of using AI because they know how to use em dashes properly, and this is dumb.