Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you know if there's a way to rate limit logged-in users differently than visitors of a site?


rate limiting can be a double edged sword, you can be better off giving a scraper highest bandwidth so they are gone sooner, otherwise somthing like making a zip or other sort of compilation of the site available may be an option.

just what kind of scraper you have is a concern.

does scraper just want a bunch of stock images;

or does scraper have FOMO on web trinkets;

or does scraper want to mirror/impersonate your site.

the last option is the most concerning because then;

scraper is mirroring bcz your site is cool and local UI/UX is wanted;

or is scraper phishing smishing or otherwise duping your users.


Yeah, good points to consider. I think the sites that would be scrapped the most would be where the data is regularly and reliably up-to-date, and a large volume of it at that - so not just one scraper but many different parties may on a daily or weekly basis try to scrap every page.

I feel that ruling should have the caveat that if a fair cost paid API version for getting publicly listed data then the scrapers must legally use that (say no more than 5% more than cost of CPU/bandwidth/etc of the scraping behaviour); ideally a rule too that at minimum there be a delay if they are republishing that data without your permission, so at least you as the platform/source/reason for the data being up-to-date aren't harmed too - which may then kill the source platform over time if regular visitors somehow start going to the competitor publishing the data.


Absolutely you just have to check the session cookie


nginx can be set up to do that using the session cookie.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: