Hacker Newsnew | past | comments | ask | show | jobs | submit | samcolvin's commentslogin

Victor Stinner (Python core developer focused on performance) has a great speech about this question:

There are a lot of reasons actually

- Performance limited by old CPython design. If you fork it you have to deal with all the legacy code.

- CPython is limited to 1 thread because of the GIL.

- Specific memory allocators, C structures, reference counting, specific garbage collector etc.

You can find that video in here: https://youtu.be/TXRPCZ7Nmh4


I can confirm that, scraping Google instantly needs huge effort and money. In our best we can scrape 2500 SERP per IP.

But i must say using proxy services and other things did not helped us much. Because most of them were banned before we use.


Yeah, and then you recycle the IP back into the pool for the next guy to work with. An operation I know of was getting 6+ million SERPs a day, budget for proxies was hundreds of thousands a year.


How does it wotk out for IPv6?


IPv6 is just not widely used, so when you do use it, you stick out like a sore thumb. Think like a bayesian: for Google, it's easy to just block whole /32s of IPv6 space.


At what point would you consider IPv6 "widely used"? It's currently 30% of traffic:

https://www.google.com/intl/en/ipv6/statistics.html


I bet you half of that is scrapers!


Complete /64's get blocked.


Is it compatible with clickhouse?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: