Bots are an incredibly large source of traffic on the non-profit academic cultural heritage site I work on. It gets very little human traffic compared to a successful for-profit site.
But the bots on my site -- at least the obvious ones that lead me to say they are a large source of traffic -- are all well-behaved, with good clear user-agents, and they respect robots.txt, so I could keep them out if I wanted.
I haven't wanted because, why? I have modified the robots.txt to keep the bots out of some mindless loops trying every combination of search criteria to access a combinatorial expansion of every possible search results page. That was doing neither of us any good, was exceeding the capacity of our papertrail plan (which is what brought it to our attention) -- and every actual data page is available in a sitemap that is available to them if they want it, they don't need to tree-search every possible search results page!
In some cases I've done extra work to change URL patterns so I could keep them out of such useless things with a robots.txt more easily, without banning them altogether. Because... why not? The more exposure the better, all our info is public. We like our pretty good organic Google SEO, and while I don't think anyone else is seriously competing with google, I don't want to privilege google and block them out either.
But the bots on my site -- at least the obvious ones that lead me to say they are a large source of traffic -- are all well-behaved, with good clear user-agents, and they respect robots.txt, so I could keep them out if I wanted.
I haven't wanted because, why? I have modified the robots.txt to keep the bots out of some mindless loops trying every combination of search criteria to access a combinatorial expansion of every possible search results page. That was doing neither of us any good, was exceeding the capacity of our papertrail plan (which is what brought it to our attention) -- and every actual data page is available in a sitemap that is available to them if they want it, they don't need to tree-search every possible search results page!
In some cases I've done extra work to change URL patterns so I could keep them out of such useless things with a robots.txt more easily, without banning them altogether. Because... why not? The more exposure the better, all our info is public. We like our pretty good organic Google SEO, and while I don't think anyone else is seriously competing with google, I don't want to privilege google and block them out either.