CAPTCHAs decrease conversion rates

citricsquid · on March 25, 2011

This sounds like it's coming from someone who hasn't had any real experience with large scale spam problems.

We operate a forum with 250k members and ~800k posts per month, a new registration every minute and we get so many spam bots even with captcha (mechanical turk etc) and without captcha it's unworkable. Captcha is a necessary evil, but it does help.

This seems to be coming from someone dealing with a site where spam wouldn't be that much of a problem, who would sign up to animoto to spam? Very silly post.

jcromartie · on March 25, 2011

They are proposing alternative methods that don't put the burden on the user in the form of explicit action. By using honeypot fields that would only be filled out by a robot, and timestamp analysis which effectively detects automatic form submission, they can weed out the bots without asking their users to do anything.

What's so silly about that?

Ysx · on March 25, 2011

Honeypot fields and their ilk are easy to bypass with a focused attack. For smaller sites, that's fine - who's going to make the effort to target you? Keep out the opportunistic bots rattling your contact form, and life's good.

For juicier targets, something more sophisticated is necessary. Captchas are one answer.

darkmethod · on March 25, 2011

Are there alternate yet equally as effective measures than what has been discussed? If so I'd like to hear more.

mkr-hn · on March 26, 2011

Ant-spam services like Akismet.

citricsquid · on March 25, 2011

Nothing, they're a good solution for this company. My point was the conclusions were based on this company, not everyone who suffers spam, the article has since been updated with:

> For some reason this article has hit the front page of Hacker News and is getting quite a lot of traffic. I should mention that yes, I acknowledge CAPTCHAs are of course sometimes unavoidable. That doesn’t mean, however, that we should ever feel good about using them, nor should we fool ourselves that users don’t mind them.

Which was my point. When spam is a serious issue then captchas are unavoidable 99% of the time.

cj · on March 25, 2011

If you're site is running on a popular forum software, robot-only inputs and timestamp analysis would eliminate most of your problems.

Spammers are probably not targeting your website in particular, rather the software your forum is run on. If you add atypical anti-spam measures you'll separate yourself from others using the same platform, defeating the typical phpbb or vbulletin bot which probably accounts for most of your spam.

boyter · on March 26, 2011

The thing is though, for anything small scale a simple "Type Human" in a box is 100% effective for the random spam bots. For anything like what you are experiencing you are being targeted so even with the best CAPTCHA around spam is still going to be a problem.

I honestly believe that CAPTCHA's are one of the most evil things on the internet and that there are many valid and better ways to avoid spam.

BTW im not just some random guy with an axe to grind over this, I wrote http://www.wausita.com/captcha/ as an example of how trivial 90% of the CAPTCHA's on the web are trivial to decode.

oscardelben · on March 26, 2011

A lot of targeted attacks use humans to decipher captchas. Hell, a lot of programs used by internet marketing will display a captcha to decipher every 2 seconds in order to post in a forum/website. Invisible fields are in my opinion a much better solution, but who cares what I think, did you try other solutions before calling them silly?

GrandMasterBirt · on March 25, 2011

You have a valid point. However this is pointing out ignorance of developers (I being one of them) that most of the time captcha is unnecessary and annoying to UX.

So yes, who would spam animoto? But you know what, now their spam filter is good enough, their user registration has been increased. Better more "foolproof" techniques will always be needed, and directed attacks are hard to prevent, but getting to a good enough point is great as well.

And the problem you face may be a smaller percentage of sites than animoto, who do need captcha especially if you are the target of a directed attack.

ejames · on March 25, 2011

Since several commenters have been asking for an explanation of honeypots and timestamps, here's a link[1] I happened to run across just recently and a quick explanation.

- Honeypots: Add a field to your form that is styled to be invisible to normal human users, such as being located off the screen, sized to 1 pixel, or placed behind/under images on the page. Bots examine a page through HTML rather than through eyesight and will not distinguish these fields. Reject submissions which have entered text in the honeypot fields. - Timestamps: Some spambots operate by 'playback' - a human fills the form out correctly once, then copy-and-pastes the form output into a script that replaces the comment text/etc. with desired spam links. Place a hidden field in your form that contains a timestamp (possibly hashed or combined with other form output). Reject submissions which contain a timestamp far in the past, indicating a bot which is 'playing back' an old submission.

The idea with defeating spam is not to be 100% accurate with unbeatable security, since no matter your system, a bot tailored to your site can defeat it. However, putting several simple techniques together can defeat general-purpose bots that shotgun spam across many sites. This reduces spam to levels that are manageable by hand.

[1]http://nedbatchelder.com/text/stopbots.html

IvarTJ · on March 25, 2011

A point made in the comments of the post is that blind users using screenreaders may notice and interact with the honeypot fields.

eli · on March 25, 2011

For this reason, I always add some brief text explaining that the field is there to detect spam and must be left blank.

I'm sure blind readers would prefer this approach to a CAPTCHA.

notahacker · on March 25, 2011

This is easily solved by text (again hidden moved off-screen for the normal user with CSS enabled).

"Thank you for filling out our form. Press <hotkey> to submit the form or <hotkey> to review."

"If you are not a real person, input your state now. Otherwise press <hotkey> to skip State : <input field>

cfinke · on March 25, 2011

That kind of text will still confuse the majority of Web users. It's the kind of thing my mom or wife would show to me and say, "So am I supposed to put my state in? Why do they have it there if I'm not supposed to fill it in?"

latortuga · on March 25, 2011

roel_v makes a valid point - the majority of web users will not see this text. The whole point is that it's hidden and will only be 'seen' by people using screen readers. The text could easily be changed to "To reduce spam we have included this extra field. If you are a human, please leave it blank."

roel_v · on March 25, 2011

Are your mom and your wife blind?

roel_v · on March 25, 2011

It was a valid question - the GP was talking about blind users, and then some guy who didn't even read the discussion properly comes waltzing in with a non sequitur about his mom.

jcromartie · on March 25, 2011

Another form of timestamp analysis is to detect submissions that happen too quickly. A spammer's signup script is likely to fill out the form and submit it nearly instantly. Of course a spammer could beat this by waiting a small randomized amount of time, but that makes spam signups more expensive and might also deter them.

falcolas · on March 25, 2011

Many automated form fillers for normal people, such as LastPass or even FireFox's form fillers will fill out the submission forms and submit them quickly as well. Perhaps not as quickly as an automated script, but worth looking out for.

zerd · on April 3, 2011

Well, that's for login. For registration it is usually not that fast.

potatolicious · on March 25, 2011

A very rudimentary defense at best - any spammer or scraper worth their salt is randomizing their timing. Better yet, have timings derived from real users. For a dedicated attack (or even a category-specific attack like forum signups) timing would solve little.

tomkarlo · on March 25, 2011

This is a good method... especially if you look at it over the course of more than one page. If you have a multi-page signup funnel, you can watch the time it takes someone to get from the first form to the last.

abredow · on March 25, 2011

I have traditionally used honeypot fields in this manner. Recently, however, I have noticed some false positives because of autofill features in browsers (especially Chrome). To work around this, I would add to the above that it may be useful to remove the field with the submit event on the form and then test for it's presence on the backend. Alternatively, just use the timestamp approach.

GrandMasterBirt · on March 25, 2011

Much appreciated especially for the link provided. Quite helpful. When thinking about it, for a user registration (I don't have comments) this seems pretty reasonable. Combined with a few fun tricks on the input fields this should be sufficient enough, especially due to other mechanisms in my site. One less captcha on the web is only better :)

Jabbles · on March 25, 2011

"We left the test running until the results were statistically significant to a 99% confidence level."

This is absolutely the wrong thing to do - not that I'm doubting the conclusion, but the data does not support that confidence level.

http://www.evanmiller.org/how-not-to-run-an-ab-test.html

jarin · on March 25, 2011

This is why I'm terrible at statistics. To me, it looks almost like wave function collapse.

The results are good only if you do a set number of observations (say 500) instead of waiting for a significant result (say it happens at 623). But what if you had decided to run 623 tests at the beginning?

Jabbles · on March 25, 2011

No problem with that. But compare these two experiments:

  for i in range 623:
    data.add_result()
  s = calculate_significance(data)
  if s > 0.95:
    publish()

  for i in range 623:
    data.add_result()
    s = calculate_significance(data)
    if s > 0.95:
      publish()
      break

The second one gives you many more chances to succeed, which must result in your confidence in the answer going down.

jasonkester · on March 25, 2011

I run a Travel Blog host, and get several hundred spam attempts per day, accounting for more than 90% of the posts on the site. Still, I refuse to put CAPTCHAs in between my users and what they want to accomplish. It's just a terrible use experience.

Instead, I use a combination of human detection scripts, bayesian filtering, and moderation. Combined, this keeps the site pretty much 100% spam free from the perspective of our end users, and more importantly, Googlebot.

More details here:

http://www.expatsoftware.com/articles/2010/03/care-and-feedi...

nbaumann · on March 25, 2011

I agree, no matter how much a problem you have with spam, you shouldn't put the burden of combating it on the user.

chime · on March 25, 2011

I have been using pseudo-timestamps and honeypot fields for a while now and it has worked pretty well for me. I get a bit of spam every now and then but it is usually someone manually copy-pasting. I could safely block those too but it is infrequent enough that I don't need to bother.

Here's my algorithm:

    /form-show

      fieldhash = hash(ymd(today))
      valuehash = hash(remoteip + ymd(today))

      <input type=hidden name=fieldhash value=valuehash>
      <input type=text name=email value="" style=display:none>

    /form-validate

      field0 = hash(ymd(yesterday))
      value0 = hash(remoteip + ymd(yesterday))
      field1 = hash(ymd(today))
      value1 = hash(reomteip + ymd(today))

      if(post[email] != "")
        // reject form

      if(post[field0] == value0 || post[field1] == value1)
        // accept form

jasonkester · on March 25, 2011

Wait until your site gets popular. It'll get fun sooner or later.

The latest thing I'm seeing on my site is a robot that automates real web browsers, jumps between ip addresses, scrapes real user content off the site, then posts it back using some form of Markov generator to make the content look unique. It'll do that on new accounts for weeks before trying to insert any links.

It's amazing the lengths spammers will go to to get their content onto your site. In this case, the crawler is clearly written specifically for my site, even though it's only PR4 and nofollows all its links. It's no wonder 99% of the content on big sites like Blogger is spam.

khafra · on March 25, 2011

That's getting dangerously close to xkcd's gold standard of spambots that actually make well-written, useful contributions to the discussion.

IgorPartola · on March 25, 2011

Link: http://xkcd.com/632/

swores · on March 25, 2011

Actually he was referring to http://xkcd.com/810/

gumbo · on March 25, 2011

Interesting analysis. I knew that people put effort in spamming, but never new they go so far.

ck2 · on March 25, 2011

Most modern bots know to jump over fields that are type=hidden or display:none/visibility:hidden - your site just isn't big enough of a target yet.

Even hiding the CSS externally doesn't help, many bots now use inline browsers to decode javascript and stylesheets.

saranagati · on March 25, 2011

what i've done recently is give a text field a worthless class, then use javascript to change it's class to a display:none. Sure bots may execute the javascript however it causes them to run much slower so I'd think the majority of them aren't.

chime · on March 25, 2011

There's no single way to block spam. I just showed two of the most basic methods I used. There are lots more that can be setup if spam starts to become a problem without resorting to captchas.

romaniv · on March 25, 2011

You just gave me an interesting (but somewhat unrelated) idea.

You can "sign" a timestamp by appending a hash of that timestamp with some sercre value. This way, whenever the user submits your form, you can reliably determine when it was requested without storing anything on the server.

JoachimSchipper · on March 25, 2011

The second half of the article reveals that (in a specific case) removing the CAPTCHA improved conversion from 48% to 64%. I didn't much like the rest of the article, but this is interesting.

showerst · on March 25, 2011

What they failed to mention is what percent of that boost came from autofillers/spammers.

They say they successfully used timestamp/honeypots to keep out spammers; if so, how many spammers did they keep out? If it was tons, then say so, that's useful information. If it wasn't very many, then they didn't need the CAPTCHA in the first place.

RyanMcGreal · on March 25, 2011

I'd be interested in knowing whether the application itself is designed to be immune to autofilled accounts. Assuming people use it to create slideshows they can then share with their family/friends and not socially/crowdsourced a la flickr, a bunch of bots with garbage accounts no one has to look at wouldn't actually harm anyone else's experience of the site.

JoachimSchipper · on March 25, 2011

Yes, the statistics are very incomplete. There's a reason this article didn't get my upvote.

TamDenholm · on March 25, 2011

I've got to say that from a developers perspective its worth trying whenever possible to not put CAPTCHA in a form if at all possible for the benefit of your customers. No one enjoys filling out a CAPTCHA. I'd say trying honeypot fields and timestamps, hashed value matching, etc that are all invisible to the end user.

I think not being a lazy developer in order to allow your customers to not make as much effort is a good thing. Only at a point where other methods dont work should you then employ CAPTCHA.

mayank · on March 25, 2011

This just encourages spambots to upgrade their technology. You could upgrade spambots quite easily by just running them inside a headless browser with full javascript support, like phantomjs.

There is very little distinction between writing a phantomjs unit test and writing a spambot.

nathanb · on March 25, 2011

Spammers make money off of volume. The more expensive it is to deploy each individual spam, the less money spammers make. Processing and running javascript, retrieving and inlining external CSS, and rendering HTML all take time. Writing custom bots also takes time.

I think the goal is to make spammers lose the arms race simply because the payoff has become too small. If we can do that without CAPTCHAs, so much the better.

TamDenholm · on March 25, 2011

You can apply the same to CAPTCHA as well, its not hard to automate CAPTCHA input either. But the reality of the situation is that the vast majority of spam bots are simple and for every additional check you put into your form you increase the effectiveness by a magnitude.

gumbo · on March 25, 2011

@mayank is right, when you get popularity you are EXPOSED, and every developer could be able to spam your système without that much effort. Captcha are Effective even if they are EVIL, anyhow i think that one day we'll find a viable solution again's spams.

mayank · on March 25, 2011

You can't do the same with captcha because that would require a degree of brute forcing. And my point was that existing spambots could trivially be upgraded to handle hidden form values and keystroke timers and other automated javascript validation.

roel_v · on March 25, 2011

Sure, they could be, and also to trivially solve captchas on mechanical turk, what's your point?

mayank · on March 25, 2011

I'm not sure what you're talking about.

With "invisible" JS form validation of any sort:

1. you run your existing spambot software through phantomjs.

2. your unmodified bot fills in all the visible fields without changing a single line of code, and the webkit backend transparently computes your hashes and other automated javascript "human" tests.

3. again, your existing "stupid" spambot code submits your form, and your site is now overrun by spam.

With Captcha, you get an image and a unique ID that is validated at the server. Sure, you could run it through mechanical turk, but I'm guessing that a few CPU cycles to load a webkit backend is still vastly cheaper than farming work out to MechTurk.

My point is that you wouldn't even have to change your spambot software to defeat these "new" validations, and they can be trivially overcome, as opposed to MechTurk+reCaptcha. Add to that the benefits of targeting sites that are relatively spam-free, and you have a real incentive for spammers to simply plug-in phantomjs instead of using WWW::Mechanize or what have you.

roel_v · on March 25, 2011

The point is that all these measure are 'trivial' to break, and so are captcha's. Except with captcha's you impose a burden on your user, and with other techniques you can offload that burden to the developer. I'm not sure what the 'existing' part in 'existing spambot' has to do with it - the time it would take to add farmed captcha solving is marginal (you don't even have to mech turk it - most captcha's are broken with OCR software readily available on the underground market anyway).

captcha = sign of clueless or lazy, or both, developer. I don't put up with it anymore - I have yet to meet a single registration that I actually need that uses a captcha. I'm not the only one, either.

cdr · on March 25, 2011

I may not be normal, but I've never minded filling out a reCaptcha. The only time I get annoyed at captchas is when they're poorly designed.

JonoW · on March 25, 2011

Sometimes I think we have it wrong. Instead of trying to determine if someone IS a spammer, why not try figure out if they're definitely NOT a spammer.

So start with a pessistic view they they are, and that they need to be shown a CAPTCHA. Then do some analysis to try figure out if they're legit, e.g. time spent on page, mouse/keyboard interaction, geo-location, referrer etc.

If they're all good, don't show them the CAPTCHA (perhaps just rely on honeypot inputs), otherwise show them a CAPCTHA as a next step after posting content (and apologise in case it's a false positive).

puredemo · on March 25, 2011

Isn't this the whole point of a reputation system?

JonoW · on March 25, 2011

Sure, if you have authenticated users, I'm talking about anonymous users

stavros · on March 25, 2011

I'd just like to point out that the way they did their A/B testing might be flawed, you can't run the test until you get a certain confidence, you have to decide beforehand how long you'll run it. They seem to have run it until they got 99% confidence, which is probably the wrong way to go about it.

gcr · on March 25, 2011

Here's an idea: Force registrants to submit a computationally expensive token along with their registration form. Perhaps it's computed with javascript. Users usually spend more than 15 seconds on the form anyways, and spammers will hate to peg their hardware like that.

Any thoughts?

jeremyw · on March 25, 2011

Fun, a number of blog plugins have picked up the hashcash ideas, e.g. http://wordpress.org/extend/plugins/wp-hashcash/.

Add 100ms-of-2011-avg-cpu computation and tie it to the submit button (avoiding any complications interleaving with user activity). So that deals with first-order dumbbots and makes life a little harder for Javscript-executing (but still volume-based) folks. Marry to a bayesian system to handle the third-order mechanical turk-style miscreants.

gcr · on March 25, 2011

I see! Interesting. Thanks for sharing this.

For curious people, Wikipedia also has related information: http://en.wikipedia.org/wiki/Proof-of-work_system

cowpewter · on March 25, 2011

Javascript is single-threaded. You'll block the UI, and your real users will spend 15 seconds staring at the <insert your OS's wait cursor here>

gcr · on March 25, 2011

There are ways around that. Web workers for browsers that support them and continuation-passing style for those that don't.

eli · on March 25, 2011

Sure, until you get hit by a 10,000-ip-strong botnet all trying to fill out your form at once.

harrybr · on March 25, 2011

The article states that Animoto use "honeypot fields and timestamp analysis" instead of CAPTCHAs, which they claim has been effective to date. What do you think of this?

eli · on March 25, 2011

I use honeypot fields myself and they stop a ton of spam submissions. I'm sure timestamp analysis can be very effective too. I'm totally a fan. But are there bots smart enough to defeat it? You bet!

Some of my forms also have a CAPTCHA. I think it's got to be case-by-case. Do you have something desirable to bad guys (like the signup for a new Yahoo account, or a high-ranking blog about pharmaceuticals)? Do you have tools in place to deal with spam submissions effectively when they do occur? Will a bunch of bots signing up for accounts degrade service for legitimate visitors?

For example, the Contact our Sales team form definitely does not have a CAPTCHA. The sales team will gladly sort though a pile of junk if it means one more inbound lead. But the Post a Comment form would be an absolute disaster without a strong CAPTCHA. A surprising amount of junk gets through anyway, in fact. (As far as I can tell, it's actual humans in developing countries copy/pasting into comments by hands. Blocking referrers from Google that have the phrase "post a comment below" made a dent)

larrik · on March 25, 2011

"Blocking referrers from Google that have the phrase "post a comment below" made a dent)"

Can you elaborate? I haven't heard this technique (I don't personally have a lot of need for spam fighting), and I'm very curious as to what you mean.

5l · on March 25, 2011

Think he probably means spammers are searching for the phrase 'post a comment below' on Google looking for forms they can spam. You'll see this search term in the HTTP referrer header.

Edit: obviously you could just avoid using this phrase on your site instead.

larrik · on March 25, 2011

Ah, that makes sense, and is rather clever.

Thanks

OstiaAntica · on March 25, 2011

Warning on Honeypots-- some mobile devices like Blackberry seem to trigger them on our service.

eli · on March 25, 2011

Interesting. I found problems with certain crappy browser toolbars that helpfully try to autocomplete forms they encounter, including hidden fields.

RyanMcGreal · on March 25, 2011

If timestamp analysis is effective now, it won't be forever. It would be trivially easy to program an autofiller to leave pseudo-random pauses between filling individual fields. If this becomes a much more common technique, the spammers will adapt.

JonnieCache · on March 25, 2011

This will greatly reduce the rate at which they are able to send the spam though, even if the delay is tiny. Similar to how bcrypt works.

smokinn · on March 25, 2011

No it won't, not if their spambot is multi-threaded. (Which all the good ones are.)

All it will do is increase the latency, the overall throughput will be pretty much unaffected.

eli · on March 25, 2011

Good point. Though keep in mind there are spam botnets with thousands and thousands of members.

_ut0p · on March 25, 2011

If the attack is specifically against your site these won't help.

JeffL · on March 25, 2011

We were getting Spam bots on our forum which uses the same registration info as our game. We used Captcha for a bit, but also noticed a big decrease in conversion rate, so then we tweaked the forum software a bit to require that you have gained at least 1 level in the game before you can post to the forum and now no captcha and no Spam.

originalgeek · on March 26, 2011

One might argue that gaining a level in the game is a captcha. Though, given your audience, it is probably not a nuisance like a traditional captcha.

hoop · on March 25, 2011

I was surprised to find that when I pressed control-f and typed "duh" that zero results were found in the comments.

However flawed the experiment might've been, it's obvious that if you add barriers (e.g., CAPTCHAs) before some end goal and detract from user experience then you decrease your conversion rate.

dazzla · on March 25, 2011

Try mollom (http://mollom.com/). It uses text analysis for the most part and only uses CAPTCHA if its not sure. Even though I don't have a huge site it blocks a lot for me.

jarin · on March 25, 2011

CloudFlare (http://cloudflare.com) also works great, since it does a quick Project Honeypot check on any suspicious visitors (along with a bunch of other good stuff).

dm8 · on March 25, 2011

CAPTCHAs were designed for identifying computers and humans apart. Initially, they were simple tests, which required users to identify certain words. However, computer vision is growing leaps and bounds. So these test have become so complicated that even humans find it difficult to comprehend CAPTCHAs. CAPTCHAs have gone from simple tests to extremely complicated ones over last 10 years but design has never changed. We need overhaul of CAPTCHA design. They need to be both usable and secure.

P.S. I'm working on the project to make CAPTCHAs more usable. We will have some updates soon. :)

bugsy · on March 25, 2011

The study he did isn't broadly valid because he only tested using a captcha system that is quite abysmal, and for which the results were not surprising.

If he wants to increase conversion rates, he should get rid of the irrelevant fields such as date of birth, zip code, country, gender, and check-to-agree to legal contract.

Ha, checking the actual site, "sign up" leads to "pricing" and not a sign up page. So much for their grave concern about losing sign ups at each stage.

On the other hand, his link to an article about including Honeypot fields is good advice and valuable. Timestamp analysis is not so great since it requires javascript and cookies. The more stuff you require the more users drop off. The problem with captchas is bad captchas that are impossible for humans to decode. Sometimes the reason these are used is because simpler captchas are implemented in a faulty manner that allows spammers to decode them without even having to do OCR. So the site developer upgrades to more complex captchas rather than fix the underlying problem that is breaking the captcha security.

wladimir · on March 25, 2011

I think it depends on the kind of CAPTCHA, how many people will give up. Some captchas are literally easier to read for a machine than for a human. For example, some use simple rotated text in unreadable grey on grey. Humans can hardly read it, but an algorithm doesn't care about the contrast at all. Very stupid. A captcha should be as easy to read by humans as possible.

joshfraser · on March 25, 2011

Before arguing that "CAPTCHA's are a necessary evil", it pays to know the life time value of a user/customer for your site. It's likely that the cost of dealing with the spam would be lower than the amount of revenue lost from your CAPTCHA-impaired conversion rate.

stcredzero · on March 25, 2011

If it's so hard to tell the true humans from the machines (CAPTCHA) shouldn't it be a lot easier to tell true machines from humans and human/machine combinations? (Human/machine combination, like a person in a debugger with some reverse engineering tools.)

Couldn't this be used to increase the security of computer systems? What if one could extend this to be able to tell particular machines from humans, human/machine combos, and counterfeit machines. I suspect one can do this. I have been working on this problem for the past 3 months, and I'm about to implement it and publish it on the App Store.

dansingerman · on March 25, 2011

Anyone able to expand on the timestamp/honeypot techniques mentioned?

ilikepi · on March 25, 2011

I suspect one technique would be to add extra fields to the HTML form that are hidden when the page is viewed in an actual browser. Any submissions with values specified for these fields would likely come from a bot, since a normal user would not have been able to enter anything.

showerst · on March 25, 2011

I'm guessing the timestamp, in its simplest form would just submit the time when the page was loaded as a hidden variable in the form, and compare it to the time the form was submitted.

If it's less than something reasonable for a person (say, 20 seconds or something), then it was clearly auto-filled.

With a little help from javascript, you could even expand this to the individual fields.

As someone mentioned in the comments above, it would be pretty trivial for spammers to adapt to this if they thought it was common, with a few random pauses. Perhaps they already have...

dpapathanasiou · on March 25, 2011

Here's a good article: http://nedbatchelder.com/text/stopbots.html

bithub · on March 25, 2011

Bots tend to fill in every input field they encounter. So you could add an empty hidden input field to your form and check if the field has been populated. Another way is to look how long it took to open the page which contains the form and the form got submitted by injecting a timestamp. Bots are way faster than humans.

Natsu · on March 25, 2011

The Project Honeypot website can help you with setting up a honeypot as well as blocking spammers other users have already detected: http://www.projecthoneypot.org/

jerfelix · on March 26, 2011

I could be mistaken, but I think Project Honeypot is trying to address a different problem - harvested email addresses.

I believe the Honeypot concept that has been discussed on here is referring to creation of a honeypot field on a web form, tempting the bot to fill it in. Many bots will blindly try to submit something into each field, just to make sure that they get all the required fields on their form submission.

By adding a honeypot field, and adding text that instructs humans to leave it blank, a very high percentage of bot submissions will be detected, with few false positives.

Furthermore, you can hide the field from humans, with CSS tricks, as others mentioned. Make it 1 pixel. Make it hidden. etc.

Natsu · on March 26, 2011

They catch comment spammers, too. It's kind of buried in the FAQ, though: http://www.projecthoneypot.org/faq.php

"How does a honey pot catch comment spammers?

In addition to including specially tagged spam trap addresses, some honey pots also include special HTML forms. Comment spammers are identified by watching what information is posted to these forms."

Here's a list of comment spammers they've caught:

http://www.projecthoneypot.org/list_of_ips.php?t=p

You're absolutely right that fake fields like that are a good way to catch bots, though, and that making your site unique is a great way to avoid being targeted by mass attacks that go after, say, all MediaWiki sites. Of course that doesn't help when you're big enough to be worth attacking specifically, but it makes things a little harder for the spammers.

plasma · on March 26, 2011

I'd love to see Google tackle this by identifying this spam and immediately penalizing the links they are spamming.

I assume the spam is there in the first place to increase search engine rankings; so why not update the Google ranking algorithms (for example) to identify this spam and immediately give the targeted site (but not the site with the spam on it!) a terribly low rating?

Then, hopefully, the incentive to spam in the first place is gone.

wordchute · on March 25, 2011

The bottom line is that people don't like CAPTCHA, and it cannot leave a good impression to irritate potential customers/users within the first five minutes of a visit. Most people don't really understand what they're used for, and they get frustrated when they cannot read them and/or get rejected. I have have definitely been taking steps to limit my use of them or dispense with them altogether.

taylorbuley · on March 25, 2011

I've had similar moments: http://fuckyoucaptcha.tumblr.com/

bobds · on March 25, 2011

Here's a trick not a lot of people use:

1. Whois IP address of spam accounts.

2. Identify bad blocks of IPs. If it's a datacenter, someone is probably running spamming software on a dedicated server or VPS. Maybe get your hands on some of those open proxy lists that are floating around.

3. Use your data to prune bad accounts, throttle or block creation of new ones, etc.

Tangaroa · on March 26, 2011

A word of warning: between forged IPs, compromised systems, and formerly hostile IP space given to new owners, an IP blacklist will eventually hit legitimate customers. I speak from experience on this since I had the same bright idea.

bobds · on March 26, 2011

You are right about the blacklist, however, it's very unlikely you will have legitimate users coming from datacenter IPs. I've used this trick to prune hundreds/thousands of bad accounts in a couple of forums. You need to be careful with it, but I think it's a worthwhile method.

suhail · on March 25, 2011

We've found that required email confirmations can drop conversion rates by 60%. Capatchas I wouldn't worry about unless you have serious spam problems. Seems better to detect and push capatchas on unhuman like engagement.

callmeed · on March 25, 2011

Roughly how would timestamp analysis work? (I'm guessing a honeypot field would be an empty text field in a hidden div or something along those lines)

BoppreH · on March 25, 2011

When the client loads the page, the server sends a hash of the timestamp and asks for the client to store it. When the client submits the form, it also sends the stored hash.

This exploits the fact that bots don't usually run javascript or load all resources on a page.

http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without...

lutescen · on March 26, 2011

I thought this was common sense. Every step needed for a user to signup in anything will eventually make a difference in conversion rates.

bfe · on March 25, 2011

If a popular site uses captcha and you can make it work without, sounds like an opportunity.

btipling · on March 25, 2011

Just create the form with JavaScript. 100% no spam.

GrandMasterBirt · on March 25, 2011

I understand the honeypot technique, which is quite cool. However what is this timestamp analysis stuff? anyone has a link to a decent explanation or care to say it in a few words?

ck2 · on March 25, 2011

They then removed the CAPTCHA, and it boosted the conversion rate up to 64%. In conversion rate lingo, that’s an uplift of 33.3%!

Pretty sure that 33% was bots, lol.

And they do train the bots to avoid negative-fields and timestamp analysis - all they have to do is look for type=hidden or display:none/visibility:hidden on the CSS

I use simple math instead of word captchas, seems easier on people.

p09p09p09 · on March 25, 2011

Pro tip: You can usually get away with entering invalid similar characters on recaptcha when the word is really blurry. Substitute 'ri' for 'n', for example.

I like to do this as a game, to see what I can get away with, adds some fun to the drudgery of typing in a captcha.

5l · on March 25, 2011

The really blurry word is the word they're trying to OCR; generally it doesn't matter what you type as it'll accept it provided the other word is entered correctly.

Of course your 'game' is hurting reCaptcha's goal of digitizing books.

Confusion · on March 25, 2011

Compare: "security labels in clothing are a way of announcing to the world that you've got a theft problem, that you don't know how to deal with it, and that you've decided to offload the frustration of the problem onto your user-base. Security labels suck, because you can't properly try some pieces of clothing on with those labels in them, which means sales go down."

Such complaining doesn't accomplish a thing, unless you tell them about an effective alternative. If you don't change anything about the trade-off they have knowingly made, nothing will change. To have any chance of convincing anyone, you at least need to explain the alternatives. Everyone that reads this post just shrugs their shoulders and ignores you, because their captchas effectively solve a problem they and their clients would suffer from without those captchas.

In this case, if you open with

  Using a CAPTCHA is a way of announcing to the world that
  you’ve got a spam problem, that you don’t know how to deal
  with it, and that you’ve decided to offload the
  frustration of the problem onto your user-base.

then I think it is very dissatisfying[1] to follow up later with

  They replaced the CAPTCHA with honeypot fields and
  timestamp analysis, which has apparently proven to be very
  effective at preventing spam while being completely
  invisible to the end user.

which indicates that you have no idea about alternatives for fighting spam, apart from some measures that have 'apparently' helped in one particular case. It's not better than someone in a bar complaining about stupid government rules, without any idea or suggestion for how to improve things.

[1] it said 'hypocritical' here. That is not the correct word for it.

pdx · on March 25, 2011

I wish I could downvote you twice.

That he offered up the word "apparently", even with strong evidence of proof shows that he's being an objective reporter and a good scientist. I'm disheartened that this would earn somebody ridicule here.

Confusion · on March 25, 2011

The plural of 'anecdote' is not data. Simply reporting an anecdote makes you neither a scientist nor a journalist, no matter how strongly the anecdote supports your feelings on some matter. In the end, this is about his feelings on captchas. He hasn't made the case that a better trade-off between fighting spam and a higher conversion is possible; he has only suggested something based on an anecdote. As others immediately questioned: what happened to spam levels? 'Apparently' is not good enough when dealing with that serious problem.

harrybr · on March 25, 2011

You're right that it's a gaping hole in the article, but it's not hypocritical. All web projects involve people from different disciplines working together. My background, for example, is in Psychology and User Research. People in my role can relay user needs, goals and expectations to you, but we can't tell you what development approach to use to solve it.

Your analogy does not stand true: security labels in real-world stores don't cause a percentage of customers to give up their purchases in frustration.

Having said that, in a forum like HN, most readers would expect both a statement of the problem and some proposed solutions. Frankly, when I posted this article, I didn't expect it to get onto the No. 1 spot on the front page. It must be a slow news day.

Confusion · on March 25, 2011

You're right, 'hypocritical' is not the correct word here, as you are not displaying behavior for which you criticise others.

As for the security labels: a few days ago I wanted to fit a belt with an awkward security label that prevented a proper fit. It was an additional bump that wasn't overcome and may have been the only thing preventing me from buying it.