More

saisrirampur · 2025-12-13T20:16:53 1765657013

Thank you, Paul! Great to see Supabase wrappers evolve. I really love the async streaming feature. It helps address use cases involving (reliably) moving larger datasets from ClickHouse to Postgres for supporting (stricter) transactional workloads.

Very excited to continue working closely to further integrate these amazing open source database technologies and make it easier for users. :)

saisrirampur · 2025-12-12T23:35:04 1765582504

More on use-cases involving TimescaleDB replication/migration to ClickHouse https://clickhouse.com/blog/timescale-to-clickhouse-clickpip...

saisrirampur · 2025-12-12T23:19:12 1765581552

I love DuckDB from a product perspective and appreciate the engineering excellence behind it. However, DuckDB was primarily built for seamless for in-process analytics, data science, data-preparation/ETL workloads than real-time customer facing analytics.

ClickHouse’s bread and butter is real-time analytics for customer-facing applications, which often come with demanding concurrency and latency requirements.

Ack, totally makes sense that both are amazing technologies - you could try both and test them at the scale your real-time application may reach, and then choose the technology that best fits your needs. :)

Ritewut · 2025-12-12T23:56:38 1765583798

I tested DuckDB and even Motherduck and this was my takeaway. Square hole, round peg situation.

saisrirampur · 2025-12-12T22:27:28 1765578448

Great question! If you’re starting a greenfield application, pg_clickhouse makes a lot of sense since you’ll be using a unified query layer for your application.

Now, coming to your question about replication: you can use PeerDB (acquired by ClickHouse https://github.com/PeerDB-io/peerdb), which is laser-focused and battle-tested at scale for Postgres-to-ClickHouse replication. Once the data is replicated into ClickHouse, you can start querying those tables from within Postgres using pg_clickhouse. In ClickHouse Cloud, we offer ClickPipes for Postgres CDC/replication, which is a managed service version of PeerDB and is tightly integrated with ClickHouse. Now there could be non-transcational tables that you can directly ingest to ClickHouse and still query using pg_clickhouse.

So TL;DR: Postgres for OLTP; ClickHouse for OLAP; PeerDB/ClickPipes for data replication; pg_clickhouse as the unified query layer. We are actively working on making this entire stack tightly integrated so that building real-time apps becomes seamless. More on that soon! :)

oulipo2 · 2025-12-13T09:27:32 1765618052

Nice! Right now I'm using Timescaledb, do you think it makes sense to move to a Postgres+CH setup instead? or only if I hit the limit of timescaledb?

Also what would be the benefit for me of querying clickhouse from Postgres, rather than directly through my backend via an ORM/SDK? is that because it would allow me to do JOINs?

What would be the typical setup if I want to JOIN analytical data (eg my IoT device readings) from CH with some business data (eg the user owning the device) from my Postgres? Would I replicate that business data to CH to do the join there, or would that be typically the exact use-case for pg_clickhouse?

saisrirampur · 2025-12-14T02:07:51 1765678071

Great questions! ClickHouse is a purpose-built analytical database with thousands of optimizations for analytics, which is why it’s typically faster and more scalable than TimescaleDB. Here’s a post that covers real scenarios where users have moved workloads from Timescale to ClickHouse: https://clickhouse.com/blog/timescale-to-clickhouse-clickpip...

If your operational (OLTP) tables are reasonably big, the recommended approach is to replicate them into ClickHouse and let ClickHouse handle the joins. This avoids cross-database joins and lets the execution be pushed fully into ClickHouse. You can use ClickPipes/PeerDB to make that super easy. https://clickhouse.com/docs/integrations/clickpipes/postgres... https://clickhouse.com/docs/integrations/clickpipes/postgres

Where pg_clickhouse fits: If you’re already using Postgres for OLTP and want to offload analytics to ClickHouse without rewriting your app, the pg_clickhouse extension helps. It lets you run OLTP and OLAP queries from Postgres, while pushing the analytical queries—and their joins—down to ClickHouse, where the replicated data lives. Going native i.e. querying ClickHouse directly for OLAP will be the most optimal and is recommended if your analytics is advanced/complex. We will be evolving pg_clickhouse over the coming months to support pushdown for more and more complex/advanced queries :)

oulipo2 · 2025-12-14T10:01:46 1765706506

Very interesting! So right now I'm developing the backend, so I can still move analytics to CH, but I'm still wondering whether it would make sense because it might not be so large that it requires it (eg 50G/year of data I'd say)

And on the other hand, I can imagine that there could be plenty of footguns with replication to another database (not instant, what about schema changes, backfills, what if some database is shutdown for update while replicating, etc), so I'm a bit cautious about having a complex setup right now

Would you have some basic examples of a "mini-backend" Postgres+Clickhouse replication, using docker-compose + Typescript/Python or something, so I could play with it and take a look at what could be the operational complexity?

saisrirampur · 2025-12-14T16:55:41 1765731341

You should just give it a shot in 10-15 min and see how it looks with ClickHouse. We made it that simple with ClickPipes :). Don’t intend to sell here, but it is as simple as signing up for trial on ClickHouse Cloud and clicking a few buttons and start seeing PG data getting synced.

In regards to footguns with replication, totally understand you being cautious. Last 2 years at PeerDB/ClickPipes was laser focused on just Postgres CDC to provide a dead simple yet highly reliable experience. The product has 100s of features, addresses 100s of footguns and actively being enhanced. Sharing some customers using this production https://clickhouse.com/blog/postgres-cdc-year-in-review-2025... You should give it a shot to see how easy it is. :)

In regards to sample apication, here is one, https://github.com/ClickHouse/HouseClick It showcases PG + CH stack. We just merged a PR to integrate pg_clickhouse too. The good news is that, there is a blog planned in a couple of weeks which showcases a tightly integrated experience PG +CH with CDC and pg_clickhouse, all in OSS. It will have docker-compose too. Your question adds up to what we are thinking next, I couldn’t resist myself to reveal it. ;) :)

oulipo2 · 2025-12-14T20:22:42 1765743762

Nice! Looking forward to reading it!

saisrirampur · 2025-12-12T21:53:48 1765576428

Appreciate you chiming in! We evaluated almost all the FDWs and landed on clickhouse_fdw (built by Ildus) as the most mature option. However, it hadn’t been maintained since 2020. We used it as the base, and the goal is to take it to the next level.

Our main focus is comprehensive pushdown capabilities. It was very surprising to see how much the Postgres FDW framework has evolved over the years and the number and types of hooks it now provides for push down. This is why we decided to lean into FDW than build an extension bottoms up. But we may still do that within pg_clickhouse for a few features, wherever FDW framework becomes a restriction.

We’ve made notable progress over the last few months, including support for pushdown of custom aggregations and SEMI JOINs/basic subqueries. Fourteen of twenty-two TPCH queries are now fully pushdownable.

We’ll be doubling down to add pushdown support for much more complex queries, CTEs, window functions, and more. More on the future here - https://github.com/ClickHouse/pg_clickhouse?tab=readme-ov-fi... All with the goal of enabling users to build fast analytics from the Postgres layer itself but still using the power of ClickHouse!

DetroitThrow · 2025-12-13T00:47:09 1765586829

>All with the goal of enabling users to build fast analytics from the Postgres layer itself but still using the power of ClickHouse!

That would be incredible! So many times I want to reach for ClickHouse but whatever company I'm at has so much inertia built into PG. Pleease add CTE support.

And yes I'm aware of PeerDB or whatever that project is called. This is still or even more helpful.

saisrirampur · 2025-12-13T06:02:02 1765605722

Totally! Making things way easier on the app and query side is very important, which is why we plan to invest heavily in this going forward.

With respect to data replication, it gets really hard and has its challenges as data sizes grow - reliably moving tens of terabytes at speed, handling intricate quirks around replication slots, enterprise-grade observability etc. PeerDB/ClickPipes is designed to solve these problems. I wrote a blog post covering this in more detail here: https://clickhouse.com/blog/postgres-cdc-year-in-review-2025

That said, point taken - we will ensure query and app migration is seamless as well and reduce friction in integrating Postgres and ClickHouse. pg_clickhouse is a step in that direction! :)

__s · 2025-12-13T01:12:38 1765588358

You're replying to the CEO of PeerDB. We recognize CDC is only one tool in the integration toolbox, which is why we're prioritizing this

DetroitThrow · 2025-12-15T15:57:57 1765814277

I shouldn't be so flippant on here, of course I'm talking to the guy who wakes up and hears this every day.

I really appreciate the work that he and y'all are doing on both sides of the equation, it's great for every org that wants to use ClickHouse but can't.

saisrirampur · 2025-12-12T21:19:34 1765574374

Good idea! Btw, ClickHouse does provide a HTTP interface directly, too! https://clickhouse.com/docs/interfaces/http

saisrirampur · 2025-10-13T20:39:14 1760387954

Sai from ClickHouse here. Very compelling story! Really love your emphasis on using the right tool for the right job - power of row vs column stores.

We recently added a MySQL/MariaDB CDC connector in ClickPipes on ClickHouse Cloud. This would have simplified your migration from MariaDB.

https://clickhouse.com/docs/integrations/clickpipes/mysql https://clickhouse.com/docs/integrations/clickpipes/mysql/so...

saisrirampur · 2025-09-05T23:33:49 1757115229

Sai from ClickHouse here. Adding to above, we just released a blog that presents JOIN benchmarks of ClickHouse against Snowflake and Databricks. This is after the recent enhancements made to the ClickHouse core. https://clickhouse.com/blog/join-me-if-you-can-clickhouse-vs.... The benchmarks is around 2 dimensions of both speed and cost.

twotwotwo · 2025-09-06T00:49:48 1757119788

This is really encouraging! Commented elsewhere in the thread but this was one of the main odd points I ran into when experimenting with ClickHouse, and the changes in the PR and mentioned in the recent video about join improvements (https://www.youtube.com/watch?v=gd3OyQzB_Fc&t=137s) seem to hit some of the problems. I'm curious whether "condition pushdown" mentioned in the video will make it so "a.foo_id=3 and b.foo_id=a.foo_id" doesn't need "b.foo_id=3" added for optimal speed.

I also share nrjames's curiosity about whether the spill-to-disk situation has improved. Not having to even think about whether a join fits in memory would be a game changer.

nrjames · 2025-09-05T23:40:13 1757115613

Will Clickhouse spill to disk yet when joins are too large for memory?

saisrirampur · 2025-08-17T19:52:13 1755460333

Very interesting take — I see where you’re coming from. Yes, there are caveats and differences between ClickHouse and Postgres. Much of this stems from the nature of the workloads they are built for: Postgres for OLTP and ClickHouse for OLAP.

We’ve been doing our best to address and clarify these differences, whether through product features like this one or by publishing content to educate users. For example: https://clickhouse.com/blog/postgres-to-clickhouse-data-mode... https://www.youtube.com/watch?v=9ipwqfuBEbc.

From what we’ve observed, the learning curve typically ranges from a few weeks for smaller to medium migrations to 1–2 months for larger ones moving real-time OLAP workloads from Postgres to ClickHouse. Still, customers are making the switch and finding value — hundreds (or more) are using both technologies together to scale their real-time applications: Postgres for low-latency, high-throughput transactions and ClickHouse for blazing-fast (100x faster) analytics.

We’re actively working to bridge the gap between the two systems, with features like faster UPDATEs, enhanced JOINs and more. That’s why I’m not sure your comment is fully generalizable — the differences largely stem from the distinct workloads they support, and we’re making steady progress in narrowing that gap.

- Sai from the ClickHouse team here.

smarx007 · 2025-08-17T20:06:45 1755461205

How much of the ISO/IEC 9075:2023 SQL standard does CH conform to?

oulipo · 2025-08-17T20:05:58 1755461158

What would be the best Postgres + CH setup to combine both? somethign using CDC and apply them to CH?

saisrirampur · 2025-08-17T20:15:12 1755461712

Great question, exactly CDC from Postgres to ClickHouse and adapting the application to start using ClickHouse for analytics. Through the PeerDB acquisition, ClickHouse now has native CDC capabilities that work at any scale (few 10s of GB to 10s of TB Postgres databases). You can use ClickPipes if you’re on ClickHouse Cloud, or PeerDB if you’re using ClickHouse OSS.

Sharing a few links for reference: https://clickhouse.com/docs/integrations/clickpipes/postgres https://github.com/PeerDB-io/peerdb https://clickhouse.com/cloud/clickpipes/postgres-cdc-connect... https://clickhouse.com/blog/clickhouse-acquires-peerdb-to-bo...

Here is a short demo/talk that we did at our annual conferemce Open House that talks about this reference architecture https://clickhouse.com/videos/postgres-and-clickhouse-the-de...

saisrirampur · 2025-05-11T15:43:12 1746978192

Neat project! Quick question, will this work only if the entire row is a duplicate? Or even if just a set of columns (ex: primary key) conflict and you guarantee only presence of the latest version of the conflict? I’m assuming former because you are deduping before data is ingested into ClickHouse. I could be missing something, wanted to confirm.

- Sai from ClickHouse

super_ar · 2025-05-11T17:06:33 1746983193

Thanks, Sai! Great question. The deduplication works based on the user-defined key, not the entire row. You can specify which field (e.g. a primary key like event_id) to use as the deduplication key. Within a defined time window, GlassFlow guarantees that only the first event with a given key will be forwarded to ClickHouse. Subsequent duplicates are rejected. Our idea was to keep ClickHouse as clean as possible.

saisrirampur · 2025-05-11T17:48:22 1746985702

Got it. Thanks for the clarification. That might not work if the ingested row represents an UPDATE. We do this in Postgres CDC by replicating an UPDATE as a new version of the row and that is what you want to retain. For most customers using FINAL (with the correct ORDER KEY as needed) works well for deduplication and query performance is still great. But in cases where it isn't, customers typically resort to tuning faster merges with ReplacingMergeTree or Materialized Views (either aggregating or refreshable) to manage deduplication.

Anyway, great work so far! I like how well you articulated the problem. Best wishes.