Writing to Kafka allowed them to continue their current ingestion process into MariaDB at the same time as ClickHouse. Kafka consumer groups allow the data to be consumed twice by different consumer pools that have different throughput without introducing bottlenecks.
From experience the Kafka tables in ClickHouse are not stable at a high volumes, and harder to debug when things go sideways. It is also easier to mutate your data before ingestion using Vector's VRL scripting language vs. ClickHouse table views (SQL) when dealing with complex data that needs to be denormalized into a flat table.
> Writing to Kafka allowed them to continue their current ingestion process into MariaDB at the same time as ClickHouse.
The one they're going to shut down as soon as this works? Yeah, great reason to make a permanent tech choice for a temporary need. Versus just keeping the MariaDB stuff exactly the same on the PHP side and writing to 2 destinations until cutover is achieved. Kafka is wholly unnecessary here. Vector is great tech but likely not needed. Kafka + Vector is absolutely the incorrect solution.
Their core problem is the destination table schema (which they did not provide) and a very poorly chosen primary key + partition.
From experience the Kafka tables in ClickHouse are not stable at a high volumes, and harder to debug when things go sideways. It is also easier to mutate your data before ingestion using Vector's VRL scripting language vs. ClickHouse table views (SQL) when dealing with complex data that needs to be denormalized into a flat table.