Thank you, Paul! Great to see Supabase wrappers evolve. I really love the async streaming feature. It helps address use cases involving (reliably) moving larger datasets from ClickHouse to Postgres for supporting (stricter) transactional workloads.
Very excited to continue working closely to further integrate these amazing open source database technologies and make it easier for users. :)
I love DuckDB from a product perspective and appreciate the engineering excellence behind it. However, DuckDB was primarily built for seamless for in-process analytics, data science, data-preparation/ETL workloads than real-time customer facing analytics.
ClickHouse’s bread and butter is real-time analytics for customer-facing applications, which often come with demanding concurrency and latency requirements.
Ack, totally makes sense that both are amazing technologies - you could try both and test them at the scale your real-time application may reach, and then choose the technology that best fits your needs. :)
Great question! If you’re starting a greenfield application, pg_clickhouse makes a lot of sense since you’ll be using a unified query layer for your application.
Now, coming to your question about replication: you can use PeerDB (acquired by ClickHouse https://github.com/PeerDB-io/peerdb), which is laser-focused and battle-tested at scale for Postgres-to-ClickHouse replication. Once the data is replicated into ClickHouse, you can start querying those tables from within Postgres using pg_clickhouse. In ClickHouse Cloud, we offer ClickPipes for Postgres CDC/replication, which is a managed service version of PeerDB and is tightly integrated with ClickHouse. Now there could be non-transcational tables that you can directly ingest to ClickHouse and still query using pg_clickhouse.
So TL;DR: Postgres for OLTP; ClickHouse for OLAP; PeerDB/ClickPipes for data replication; pg_clickhouse as the unified query layer. We are actively working on making this entire stack tightly integrated so that building real-time apps becomes seamless. More on that soon! :)
Nice! Right now I'm using Timescaledb, do you think it makes sense to move to a Postgres+CH setup instead? or only if I hit the limit of timescaledb?
Also what would be the benefit for me of querying clickhouse from Postgres, rather than directly through my backend via an ORM/SDK? is that because it would allow me to do JOINs?
What would be the typical setup if I want to JOIN analytical data (eg my IoT device readings) from CH with some business data (eg the user owning the device) from my Postgres? Would I replicate that business data to CH to do the join there, or would that be typically the exact use-case for pg_clickhouse?
Great questions! ClickHouse is a purpose-built analytical database with thousands of optimizations for analytics, which is why it’s typically faster and more scalable than TimescaleDB. Here’s a post that covers real scenarios where users have moved workloads from Timescale to ClickHouse:
https://clickhouse.com/blog/timescale-to-clickhouse-clickpip...
Where pg_clickhouse fits: If you’re already using Postgres for OLTP and want to offload analytics to ClickHouse without rewriting your app, the pg_clickhouse extension helps. It lets you run OLTP and OLAP queries from Postgres, while pushing the analytical queries—and their joins—down to ClickHouse, where the replicated data lives. Going native i.e. querying ClickHouse directly for OLAP will be the most optimal and is recommended if your analytics is advanced/complex. We will be evolving pg_clickhouse over the coming months to support pushdown for more and more complex/advanced queries :)
Very interesting! So right now I'm developing the backend, so I can still move analytics to CH, but I'm still wondering whether it would make sense because it might not be so large that it requires it (eg 50G/year of data I'd say)
And on the other hand, I can imagine that there could be plenty of footguns with replication to another database (not instant, what about schema changes, backfills, what if some database is shutdown for update while replicating, etc), so I'm a bit cautious about having a complex setup right now
Would you have some basic examples of a "mini-backend" Postgres+Clickhouse replication, using docker-compose + Typescript/Python or something, so I could play with it and take a look at what could be the operational complexity?
You should just give it a shot in 10-15 min and see how it looks with ClickHouse. We made it that simple with ClickPipes :). Don’t intend to sell here, but it is as simple as signing up for trial on ClickHouse Cloud and clicking a few buttons and start seeing PG data getting synced.
In regards to footguns with replication, totally understand you being cautious. Last 2 years at PeerDB/ClickPipes was laser focused on just Postgres CDC to provide a dead simple yet highly reliable experience. The product has 100s of features, addresses 100s of footguns and actively being enhanced. Sharing some customers using this production https://clickhouse.com/blog/postgres-cdc-year-in-review-2025... You should give it a shot to see how easy it is. :)
In regards to sample apication, here is one, https://github.com/ClickHouse/HouseClick It showcases PG + CH stack. We just merged a PR to integrate pg_clickhouse too. The good news is that, there is a blog planned in a couple of weeks which showcases a tightly integrated experience PG +CH with CDC and pg_clickhouse, all in OSS. It will have docker-compose too. Your question adds up to what we are thinking next, I couldn’t resist myself to reveal it. ;) :)
Appreciate you chiming in! We evaluated almost all the FDWs and landed on clickhouse_fdw (built by Ildus) as the most mature option. However, it hadn’t been maintained since 2020. We used it as the base, and the goal is to take it to the next level.
Our main focus is comprehensive pushdown capabilities. It was very surprising to see how much the Postgres FDW framework has evolved over the years and the number and types of hooks it now provides for push down. This is why we decided to lean into FDW than build an extension bottoms up. But we may still do that within pg_clickhouse for a few features, wherever FDW framework becomes a restriction.
We’ve made notable progress over the last few months, including support for pushdown of custom aggregations and SEMI JOINs/basic subqueries. Fourteen of twenty-two TPCH queries are now fully pushdownable.
We’ll be doubling down to add pushdown support for much more complex queries, CTEs, window functions, and more. More on the future here - https://github.com/ClickHouse/pg_clickhouse?tab=readme-ov-fi...
All with the goal of enabling users to build fast analytics from the Postgres layer itself but still using the power of ClickHouse!
>All with the goal of enabling users to build fast analytics from the Postgres layer itself but still using the power of ClickHouse!
That would be incredible! So many times I want to reach for ClickHouse but whatever company I'm at has so much inertia built into PG. Pleease add CTE support.
And yes I'm aware of PeerDB or whatever that project is called. This is still or even more helpful.
Totally! Making things way easier on the app and query side is very important, which is why we plan to invest heavily in this going forward.
With respect to data replication, it gets really hard and has its challenges as data sizes grow - reliably moving tens of terabytes at speed, handling intricate quirks around replication slots, enterprise-grade observability etc. PeerDB/ClickPipes is designed to solve these problems. I wrote a blog post covering this in more detail here: https://clickhouse.com/blog/postgres-cdc-year-in-review-2025
That said, point taken - we will ensure query and app migration is seamless as well and reduce friction in integrating Postgres and ClickHouse. pg_clickhouse is a step in that direction! :)
I shouldn't be so flippant on here, of course I'm talking to the guy who wakes up and hears this every day.
I really appreciate the work that he and y'all are doing on both sides of the equation, it's great for every org that wants to use ClickHouse but can't.
Sai from ClickHouse here. Adding to above, we just released a blog that presents JOIN benchmarks of ClickHouse against Snowflake and Databricks. This is after the recent enhancements made to the ClickHouse core. https://clickhouse.com/blog/join-me-if-you-can-clickhouse-vs.... The benchmarks is around 2 dimensions of both speed and cost.
This is really encouraging! Commented elsewhere in the thread but this was one of the main odd points I ran into when experimenting with ClickHouse, and the changes in the PR and mentioned in the recent video about join improvements (https://www.youtube.com/watch?v=gd3OyQzB_Fc&t=137s) seem to hit some of the problems. I'm curious whether "condition pushdown" mentioned in the video will make it so "a.foo_id=3 and b.foo_id=a.foo_id" doesn't need "b.foo_id=3" added for optimal speed.
I also share nrjames's curiosity about whether the spill-to-disk situation has improved. Not having to even think about whether a join fits in memory would be a game changer.
Very interesting take — I see where you’re coming from. Yes, there are caveats and differences between ClickHouse and Postgres. Much of this stems from the nature of the workloads they are built for: Postgres for OLTP and ClickHouse for OLAP.
From what we’ve observed, the learning curve typically ranges from a few weeks for smaller to medium migrations to 1–2 months for larger ones moving real-time OLAP workloads from Postgres to ClickHouse. Still, customers are making the switch and finding value — hundreds (or more) are using both technologies together to scale their real-time applications: Postgres for low-latency, high-throughput transactions and ClickHouse for blazing-fast (100x faster) analytics.
We’re actively working to bridge the gap between the two systems, with features like faster UPDATEs, enhanced JOINs and more. That’s why I’m not sure your comment is fully generalizable — the differences largely stem from the distinct workloads they support, and we’re making steady progress in narrowing that gap.
Great question, exactly CDC from Postgres to ClickHouse and adapting the application to start using ClickHouse for analytics. Through the PeerDB acquisition, ClickHouse now has native CDC capabilities that work at any scale (few 10s of GB to 10s of TB Postgres databases). You can use ClickPipes if you’re on ClickHouse Cloud, or PeerDB if you’re using ClickHouse OSS.
Neat project! Quick question, will this work only if the entire row is a duplicate? Or even if just a set of columns (ex: primary key) conflict and you guarantee only presence of the latest version of the conflict? I’m assuming former because you are deduping before data is ingested into ClickHouse. I could be missing something, wanted to confirm.
Thanks, Sai! Great question. The deduplication works based on the user-defined key, not the entire row. You can specify which field (e.g. a primary key like event_id) to use as the deduplication key. Within a defined time window, GlassFlow guarantees that only the first event with a given key will be forwarded to ClickHouse. Subsequent duplicates are rejected. Our idea was to keep ClickHouse as clean as possible.
Got it. Thanks for the clarification. That might not work if the ingested row represents an UPDATE. We do this in Postgres CDC by replicating an UPDATE as a new version of the row and that is what you want to retain. For most customers using FINAL (with the correct ORDER KEY as needed) works well for deduplication and query performance is still great. But in cases where it isn't, customers typically resort to tuning faster merges with ReplacingMergeTree or Materialized Views (either aggregating or refreshable) to manage deduplication.
Anyway, great work so far! I like how well you articulated the problem. Best wishes.
Very excited to continue working closely to further integrate these amazing open source database technologies and make it easier for users. :)
reply