On top of that I don't think it's fair to say it's 10x faster when it btree was ...

jschafer · on Dec 23, 2024

I do not know if it is still the case, but the last time I looked into the source code SQLite did hash all strings to the exact same value.

So the bloom filter optimization does not work there.

It had to do with the different ways strings can be compared with collating functions, as strings may be equal even if they have different bytes: https://sqlite.org/forum/forumpost/0846211821

thaumasiotes · on Dec 23, 2024

Why do you want a UUID index? Use an integer index and have the UUID in another column.

dymk · on Dec 23, 2024

Because if you want to refer to things by a UUID, now you have two indexes

leourbina · on Dec 23, 2024

UUIDs are very wasteful [1]. For most use cases you can replace them with much shorter strings and still have very low chances of collisions [2]

[1] https://henvic.dev/posts/uuid/

[2] https://alex7kom.github.io/nano-nanoid-cc/

cogman10 · on Dec 23, 2024

Call me crazy, but I'm simply splitting my UUID into the higher and lower bits and indexing off that.

IE

    CREATE TABLE foo(
        id_ms    UNSIGNED BIG INT NOT NULL,
        id_ls    UNSIGNED BIG INT NOT NULL,
        PRIMARY KEY (id_ms, id_ls)
    ) WITHOUT ROWID;

That works well with UUIDv7 and is just storing 128bits rather than a full string. In most languages it's pretty trivial to turn 2 longs into a UUID and vice versa.

diroussel · on Dec 25, 2024

Is there any advantage to this approach over Postgres native uuid support which should store the same number of bits?

cogman10 · on Dec 26, 2024

No. This approach is strictly for DBS like sqlite without uuid or 128bit integer support.

PittleyDunkin · on Dec 23, 2024

Sure, at cost of increased complexity of access. Sometimes the waste is worth the simplicity.

dymk · on Dec 23, 2024

Sounds complex, just use a UUID. If that’s the dominating factor for storage, then you have a different problem to solve.

sgarland · on Dec 24, 2024

In SQLite, if you were to define a TEXT column (or anything other than INTEGER, for that matter) with a UUID as the PK, you’d already have two indices, because it stores data based on the rowid [0]. So you’d already have a level of indirection, where the “PK” would be pointing to the rowid.

You could define the table as WITHOUT ROWID [1], but as docs point out, the average row size shouldn’t exceed 200 bytes for the default 4 KiB page size. Since a UUID in text form is at best 32 chars, that doesn’t leave much for the rest of the columns.

[0]: https://www.sqlite.org/lang_createtable.html#rowid

[1]: https://www.sqlite.org/withoutrowid.html