Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s not quite the same thing but nearby:

I built a EAV secondary index system on top of Postgres to accelerate Notion’s user-defined-schema “Databases” feature about a year ago. By secondary index, I mean the EAV table was used for queries that returned IDs, and we hydrated the full objects from another store.

We’d heard that “EAV in Postgres is bad” but wanted to find out for ourselves. Our strategy was to push the whole query down to Postgres and avoid doing query planning in our application code.

When we first turned it on in our dogfood environment, the results looked quite promising; large improvement compared to the baseline system at p75, but above that things looked rough, and at p95 queries would never complete (time out after 60s).

It worked great if you want to filter and sort on the same single attribute. The problem queries were when we tried to query and sort on multiple different attributes. We spent a few weeks fixing the most obviously broken classes of query and learned a lot about common table expressions, all the different join types, and strategies for hinting the Postgres query planner. Performance up to p95 was looking good, but after p95 we still had a lot of timeout queries.

It turns out using an EAV table means Postgres statistics system is totally oblivious to the shape of objects, so the query planner will be very silly sometimes when you JOIN. Things like forget about the Value index and just use a primary key scan for some arms of the join because the index doesn’t look effective enough.

It was clear we’d need to move a lot of query planning to the application, maintain our own “table” statistics, and do app joins instead of Postgres joins if Postgres was going to mess it up. That last part was the last nail in the coffin - we really couldn’t lean on join in PG at all because we had no way to know when the query planner was going to be silly.

It was worth doing for the learning! I merged a PR deleting the EAV code about a month ago, and we rolled out a totally different design to production last week :)



I really love Postgres, but I'll never not laugh at the fact that duplicating a CTE caused my query to go faster... (60s to 5s)

Postgres really trips up when you start joining tables

Sometimes you can fix it with "(not) materialized" hints, but a lot of the time you just have to create materialized views or de-normalize your data into manual materialized views managed by the application


Does postgres not have the ability to hint or force indexes?

Long long time ago, I found that quite helpful with MySQL.


It does not, and that fact is the #1 downside of Postgres. It is not predictable or controllable at scale, and comes with inherent risk because you cannot “lock into” a good query plan. I have been paged at 3 am a few times because Postgres decided it didn’t like a perfectly reasonable index anymore and wanted to try a full table scan instead :(


Nope, weirdly Postgres still doesn't have that ability even today.


It’s not in core, but there are multiple extensions that provide this functionality




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: