More

mfreed · 2025-10-30T21:02:25 1761858145

None of this. It's in the blog post in a lot of detail =)

The 5ms write latency is because the backend distributed block storage layer is doing synchronous replication to multiple servers for high availability and durability before ack'ing a write. (And this path has not yet been super-performance-optimized for latency, to be honest.)

mfreed · 2025-10-30T20:29:53 1761856193

Currently support multi-AZ clusters and multi-region disaster recovery (continuous PITR between regions).

We're continuing to evaluate demand for multi-region clusters, love to hear from you.

mfreed · 2025-10-30T20:19:58 1761855598

I'm not aware of any published source for this time limit, nor ways to reduce it.

The docs do say, however, "If the volume has been impaired for more than 20 minutes, you can contact the AWS Support Center." [0] which suggests its some expected cleanup/remount interval.

That is, it is something that we regularly encounter when EC2 instances fail, so we were sharing from personal experience.

[0] https://docs.aws.amazon.com/ebs/latest/userguide/work_volume...

mfreed · 2025-10-30T17:08:36 1761844116

Tiger Cloud certainly continues to run on AWS. We have built it to rely on fairly low-level AWS primitives like EC2, EBS, and S3 (as opposed to some of the higher-level service offerings).

Our existing Postgres fleet, which uses EBS for storage, still serves thousands of customers today; nothing has changed there.

What’s new is Fluid Storage, our disaggregated storage layer that currently powers the new free tier (while in beta). In this architecture, the compute nodes running Postgres still access block storage over the network. But instead of that being AWS EBS, it’s our own distributed storage system.

From a hardware standpoint, the servers that make up the Fluid Storage layer are standard EC2 instances with fast local disks.

mfreed · 2025-10-30T16:02:42 1761840162

A few datapoints that might help frame this:

- EBS typically operates in the millisecond range. AWS' own documentation suggests "several milliseconds"; our own experience with EBS is 1-2 ms. Reads/writes to local disk alone are certainly faster, but it's more meaningful to compare this against other forms of network-attached storage.

- If durability matters, async replication isn't really the right baseline for local disk setups. Most production deployments of Postgres/databases rely on synchronous replication -- or "semi-sync," which still waits for at least one or a subset of acknowledgments before committing -- which in the cloud lands you in the single-digit millisecond range for writes again.

mfreed · on April 16, 2024

Our experience is that Clickhouse and Timescale are designed for different workloads, and that Timescale is optimized for many of the time-series workloads people use in production:

- https://www.timescale.com/blog/what-is-clickhouse-how-does-i...

Sidenote: Timescale _does_ provide columnar storage. I don't believe that the blog author focused on this as part of insert benchmarks:

- Timescale columnar storage: https://www.timescale.com/blog/building-columnar-compression...

- Timescale query vectorization: https://www.timescale.com/blog/teaching-postgres-new-tricks-...

rkwasny · on April 16, 2024

Well, as a Co-founder and CTO of Timescale, would you say TimescaleDB is a good fit for storing weather data as OP does?

mfreed · on April 16, 2024

TimescaleDB primarily serves operational use cases: Developers building products on top of live data, where you are regularly streaming in fresh data, and you often know what many queries look like a priori, because those are powering your live APIs, dashboards, and product experience.

That's different from a data warehouse or many traditional "OLAP" use cases, where you might dump a big dataset statically, and then people will occasionally do ad-hoc queries against it. This is the big weather dataset file sitting on your desktop that you occasionally query while on holidays.

So it's less about "can you store weather data", but what does that use case look like? How are the queries shaped? Are you saving a single dataset for ad-hoc queries across the entire dataset, or continuously streaming in new data, and aging out or de-prioritizing old data?

In most of the products we serve, customers are often interested in recent data in a very granular format ("shallow and wide"), or longer historical queries along a well defined axis ("deep and narrow").

For example, this is where the benefits of TimescaleDB's segmented columnar compression emerges. It optimizes for those queries which are very common in your application, e.g., an IoT application that groups by or selected by deviceID, crypto/fintech analysis based on the ticker symbol, product analytics based on tenantID, etc.

If you look at Clickbench, what most of the queries say are: Scan ALL the data in your database, and GROUP BY one of the 100 columns in the web analytics logs.

- https://github.com/ClickHouse/ClickBench/blob/main/clickhous...

There are almost no time-predicates in the benchmark that Clickhouse created, but perhaps that is not surprising given it was designed for ad-hoc weblog analytics at Yandex.

So yes, Timescale serves many products today that use weather data, but has made different choices than Clickhouse (or things like DuckDB, pg_analytics, etc) to serve those more operational use cases.

mfreed · on Feb 3, 2024

Check out how TimescaleDB adds columnar compression to PostgreSQL, typically saving 95% of storage overhead:

https://www.timescale.com/blog/building-columnar-compression...

tbragin · on Feb 3, 2024

However if you really want to optimize data currently residing in Postgres for analytical workloads, as the original comment suggests - consider moving to a dedicated OLAP DB like ClickHouse.

See results from Gitlab benchmarking ClickHouse vs TimescaleDB: https://gitlab.com/gitlab-org/incubation-engineering/apm/apm...

Key findings:

* ClickHouse has a much smaller data volume footprint in all cases by almost a factor of 10.

* There are very few ClickHouse queries that have >1s latency at q95. TimescaleDB has multiple >1s latencies, including a few in the range of 15-25s.

Disclaimer: I work at ClickHouse

osigurdson · on Feb 4, 2024

What we ended up doing is maintain meta-data in Postgres but time series data is stored in ClickHouse. Thanks for making / working on ClickHouse. I appreciate it very much.

mfreed · on Feb 3, 2024

That PoC benchmark didn't turn on Timescale's columnar compression, which every real deployment uses. So misleading at best.

(Timescaler)

tbragin · on Feb 3, 2024

Compression helped, but not enough in our experiments https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...

mfreed · on Feb 4, 2024

All depends on the benchmark :)

https://www.timescale.com/blog/what-is-clickhouse-how-does-i...

boulos · on Feb 4, 2024

Do you have a version using the same instance type? Seems weird to mix them.

tbragin · on Feb 4, 2024

Indeed, ClickHouse results were run on an older instance type of the same family and size (c5.4xlarge for ClickHouse and c6a.4xlarge for Timescale), so if anything ClickHouse results are at a slight disadvantage.

This is an open source benchmark - we'd love contributions from Timescale enthusiasts if we missed something: https://github.com/ClickHouse/ClickBench/

boulos · on Feb 4, 2024

Eh, c6a is also an AMD Rome which has worse memory bandwidth at the tails and weaker per thread performance than Cascadelake (c5). I don't understand anything about this particular benchmark, but I wouldn't compare them simply as "older vs newer".

osigurdson · on Feb 3, 2024

TimeScale was certainly the first choice as we were already using Postgres. However, we could not get it to perform well as times are simulated / non monotonic. We also ultimately need to be able to manage low trillions of points in the long run. InfluxDB was also evaluated but faced a number of issues as well (though I am certain both it and TimeScale would work fine for some use cases).

I think perhaps because ClickHouse is a little more general purpose, it was easier to map our use case to it. Also, one thing I appreciate about ClickHouse is it doesn't feel like a black box - once you understand the data model it is very easy to reason about what will work and what will not.

out_of_protocol · on Feb 4, 2024

Did you look at something Parquet-based? Different approach, could work on very large time-series-like datasets. E.g. snowflake, Apache Iceberg

mfreed · on Feb 3, 2024

https://www.timescale.com/blog/how-we-scaled-postgresql-to-3...

Staring at a >trillion rows in a TimescaleDB hypertable on PostgreSQL.

mfreed · on Oct 1, 2023

Also, Timescale similarly introduced S3 for bottomless data tiering:

https://www.timescale.com/blog/expanding-the-boundaries-of-p...

wtfishackernews · on Oct 1, 2023

Only for their managed solution at the moment.

mfreed · on Oct 1, 2023

Can you say more about "dynamic labels"? Do you just mean that as you evolve, you want to add a new type of "key-value" pair?

The most common approach here is just to store the step of "dynamic" labels in JSON, which can be evolved arbitrarily.

And we've found that this type of data actually compresses quite well in practice.

Also regarding compression, Timescale supports transparent mutability on compressed data, so you can directly INSERT/UPDATE/UPSERT/DELETE into compressed data. Under the covers, it's doing smart optimizations to manage how it asynchronously maps individual mutations into segment level operations to decompress/recompress.

(Timescale cofounder)