Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In fact, a properly-configured Kafka cluster on minimal hardware will saturate its network link before it hits CPU or disk bottlenecks.


Isn't that true for everything on the cloud? I thought we are long into the era where your disk comes over the network there.


Depends on how you configure the clients, ask me how I know that using a K8s pod id in a consumer group id is a really bad idea - or how setting batch size to 1 and linger to 0 is a really bad idea - the former blows up disk (all those unique consumer groups cause the backing topic to consume a lot of space, as the topic is by default only compacted) and the latter thrashes request handler CPU time.


But it can do so many processes a second I’ll be able to scale to the moon before I ever launch.


This doesn't even make sense. How do you know what the network links or the other bottlenecks are like? There are a grandiose number of assumptions being made here.


There is a finite and relatively narrow range of ratios of CPU, memory, and network throughput in both modern cloud offerings and bare hardware configurations.

Obviously it's possible to build, for example, a machine with 2 cores, a 10Gbps network link, and a single HDD that would falsify my statement.


But the workload matters. Even the comment in the article doesn't completely make sense for me in that way -- if your workload is 50 operations per byte transferred versus 5000 operations per byte transferred, there is a considerable difference in hardware requirements.


Exactly. "a properly-configured Kafka cluster" implies you have very properly configured your clients too, which is almost never the case because it's practically very hard to do in the messy reality of a large-scale organization.

Even if you somehow get everyone to follow best-practices, you most likely still won't get to saturate the network on "minimal hardware". The number of client connections and requests per second will likely saturate your "minimal CPU".

It's true that minimal hardware on Kafka can saturate the network, but this mostly happens in low-digit client scenarios. In practice, orgs pushing serious data have serious client counts.


A network link can be anything from 1Gbps to 800Gbps.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: