Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For web scraping specifically, I’ve developed key parts of commercial systems to automatically bypass reCAPTCHA, Arkose Labs (Fun Captcha), etc.

If someone dedicated themselves to it, there’s a lot more that these solutions could be doing to distinguish between humans and bots, but it requires true specialized talent and larger expenses.

Also, for a handful of the companies which make the most popular captcha solutions, I don’t think the incentives align properly to fully segregate human and bot traffic at this time.

I think we’re still very much still picking at the very lowest hanging fruit, both for anti-bot countermeasures and anti-anti-bot (counter-countermeasures).

Personally I believe this will finally accelerate once AI’s can play computer games via a camera, keyboard, and mouse. And when successors GPT-3 / PaLM can participate well in niche discussion forums like HackerNews or the Discord server for Rust.

Until then it’s mainly a cost filter or confidence modification. As long as enough bots are blocked so that the ones which remain are technically competent enough to not stress the servers, most companies don’t care. And as long as the businesses deploying reCAPTCHA are reasonably confident that most of the views they get are humans (even if that belief is false), Google doesn’t have a strong incentive to improve the system.

Reddit doesn’t seem to care much either. As long as the bots which participate are “good enough”, it drives engagement metrics and increases revenue.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: