Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Things that don’t work well with MySQL’s FOREIGN KEY implementation (openark.org)
75 points by grep_it on May 1, 2023 | hide | past | favorite | 35 comments


Just to throw additional warnings on the pile, MySQL FKs are not ANSI SQL compliant and can be set up on non-unique columns, which can be a major pain when porting across DBs.

https://dev.mysql.com/doc/refman/8.0/en/ansi-diff-foreign-ke...


Whoa, I've never made a foreign key on a non-unique column, had no idea MySQL would allow that. I guess I've pretty much always done foreign keys pointing to primary keys, so they're definitely unique.


Did you know you can also have a BLOB as a primary key for a table? I found that in an app I just took over the other day. Was shocked.


> A FOREIGN KEY constraint that references a non-UNIQUE key is not standard SQL but rather an InnoDB extension.

Anyone have any background on why this exists? For what purpose would you want a non-unique FK; what are the semantics of a FK that resolves to multiple different records?


I don't know why this was implemented in MySQL's implementation but here's an example scenario where the general capability where you have a logical foreign key but not necessarily uniqueness in the pyhsical target. Pardon the jargon:

In a reporting/ROLAP dimensional data model, you may have a slowly changing dimension where you have something a table design like:

  customer_id (varchar(N)),
  current_record_flag boolean, 
  record_effective_start_date, 
  record_effective_end_date, 
  attribute1, 
 attribute2, etc
and a sales transaction table:

  customer_id -- non-unique FK
  product_id
  sale_date 
  sold_quantity
  unit_price
  total_sales_amount
The customer ID in the sales table really relates back to the customer ID in the customer dimension table but not to a single UNIQUE row in the customer dimension table... unless you are, as you always are, either querying on current_record_flag='Y' (for seeing current customer attribute values), or a specific date value that falls between record_effective_date and record_end_date (for seeing customer attribute values at the time of the sales transaction_date).

(And optimal query performance for these types of analytic use case often involves using a columnar storage engine rather than InnoDB.)


Thanks for the example. I can see how a temporal/stochastic dimension can be important for some domains, but it seems like a crime to overload the relational FK semantics to support it. Just add one more layer of data abstraction (e.g. Org/Person <- temporal Customer profile <- Sale, or temporal Attributes -> Customer <- Sale), the RDBMS is made for this after all... Or use a DB with proper temporal support.


I could probably fall into this and get side-tracked for the rest of my workday, but could you explain how update/delete cascades work in a FK like this?


Updates and deletes in ROLAP dimensional data models are usually performed via explicit ETL jobs, not triggers nor particularly update/delete cascading triggers.

When a customer record is deleted in a source system, you typically do not want to delete (much less cascading delete) it from a downstream ROLAP data model because you are trying to maintain a history of past activity for reporting and/or audit purposes. Instead you might soft-delete a customer dimension table record and/or mark its effective end date as having occurred.

If a customer record is updated in a source system, depending on the design you want to implement for your use cases and history you want to retain or ignore, you may choose to update the corresponding "current" customer row in your customer dimension, or you may, for important attribute changes, update the end-date of the current customer row and insert a new second row with the latest set of customer attributes and new record effective dates (typically putting the effective end date in some distant future date).

In a standard third normal data model for an application, cascading update/deletes enable you to maintain a consistent state across a series of tables. In a dimensional data model, you typically just ensure that any dimensions get updated first, then perhaps outrigger/snowflake dimensions, then your core transactional tables. And transactional tables are rarely updated; usually you try never to touch them and you use ETL jobs to update the dimensions.

There are different styles/techniques for different use cases. At the risk of side-tracking you, a couple random points of reference: https://www.kimballgroup.com/data-warehouse-business-intelli... https://www.kimballgroup.com/2008/08/slowly-changing-dimensi... https://www.kimballgroup.com/2013/02/design-tip-152-slowly-c...


Sorry if this is an absurdly simple question... What is the point of a foreign key that has no cascade? Just to help speed up indexing?


Not sure the specific MySQL thinking, but FKs in a large data model help new people understand or confirm relationships, and also make importing your data model into an ERD tool for documenting to others much easier. FKs which enforce relational integrity are always useful against bugs. Some query planners (e.g Redshift) even use FK relationships to optimize query plans even when there aren’t indexes nor constraints provided by FKs.


Its a shortcut to support use of a denornalized schema where, in a properly normalized schema, there’d be abother table where the row was unique.


Always use a nornalized schema. The ability to see the future is invaluable in a database.


I am probably in the minority and would get shouted at by a DBA of yore, but I use foreign keys for referential integrity and… that’s it. We use soft deletes for almost all rows, so the whole cascading side is less relevant for us than getting an error in test or even production because a missing relation has been used. Combined with using non-integer primary keys, it’s godsend.


I remember working on a large database without foreign keys for a while. When asked why did not use foreign keys, I was told that they didn't like the opaqueness of CASCADE, which I could understand. I guess I did not give it much thought afterwards, until I later ended up in a shop, where the database did include foreign keys, but only with RESTRICT. It was eye-opening how useful foreign keys were, when they were just integrity checks.


OP. I agree! RESTRICT is by far the best rule to use, and makes the most sense. Perhaps to balance my post a bit, and for what it's worth, I don't advocate for "don't ever use foreign keys" as a blanket statement. My experience was one where using foreign keys did not make sense. I do wish they were more operationally friendly.


Whoops. I wrote "OP" when I really meant "Post author". I'm a bit rusty with HN notations.


what does 'more operationally friendly' mean? TIA


Like the issues I mention in my post: modifying the data type of a column that is used by a foreign key; otherwise the fact you can't run Online DDL on a table that participates in foreign key relationship ; that INSTANT does not support (yet?) adding/removing foreign key constraints ; that cascaded writes are not written the the binary logs.

These are all things that the casual developer doesn't deal with when designing a schema and writes an app that INSERTs/DELETEs/UPDATEs to tables with foreign keys. But once there's a need for a change; once you wire 3rd party tools onto your database, that's where the operations hit a wall.


I didn't realise grep_it and you (shlomi-noach) were the same, apologies


We are not the same. My mistake for writing "OP".


eBay released a blog/paper over two decades ago talking about how they achieved scale in part by removing FK constraints from their database. It was vogue for the next decade when SAN/DAS IOPS were ~1-3k, and FK constraints could lead to more disk writes (like Postgres multixact).


I was a DBA for 5 years and we outlawed cascading anything be they updates or deletes. So you’re not doing anything wrong.


What's wrong with non-integer primary keys?


You can pack fewer of them into a page, which means more disk IO for things like index scans.


besides performance for using something that is not often tested by devs working in performance, bad support for locale collating and open you for bugs such as indexing on email but allowing application to store both upper and lower case etc.


I would expect they're referring to UUIDs or something


I will take another rare opportunity of anything MySQL ends up being on HN, Considering [1] mySQL v5.x will EOL this year. And MySQL 8.0 with EOL in 2026. Does anyone knows if MySQL 9.0 will come anytime soon?

[1] https://endoflife.software/applications/databases/mysql


only oracle knows. and they don’t share answer yet, on this site or any other

only public news so far is extremely brief twitter mention of future switch to separate LTS releases from feature releases

big picture, hard to see what would motivate them to major re-invest in current mysql product model! amazon, planetscale, and co all profit off of oracle’s mysql server development efforts. and oracle does not get anything in return

assume this why more and more mysql dev efforts go to saas-only product like “mysql heatwave”!


Nice article.

Note that there's more than one 'MySQL FOREIGN KEY' implementation. MySQL Ndb Cluster also supports foreign keys with some differences wrt the InnoDB implementation :

- NDB, therefore not limited to a single MySQL Server, shard etc - Not limited to references between tables in a single database - Supports NoAction deferred constraint checks - Cascaded changes Binlogged independently as part of RBR (Nice side effect of reducing replica apply time work) ...

https://dev.mysql.com/blog-archive/foreign-keys-in-mysql-clu...

Some of the issues described wrt DDL limitations are shared.

Many schemas seem to overuse foreign keys perhaps under the assumption that they are required for or accelerate joins?


innodb supports cross-schema foreign keys, there’s no limitation to tables in a single database

can be terrible in mysql 8 though due to metadata locks now extending across foreign key boundaries. this means alter in one schema can block things in other schema if foreign key across databases

speaking of, am surprised that blog post author doesnt discuss the new mysql 8 metadata locking behavior, is new major problem with mysql foreign keys!


You are right about cross schema foreign keys being supported, my mistake.


The last time I checked, MySQL still can’t do foreign keys with binary blobs. (I think it works with some very specific limitations but not enough to actually use in the real world.)


I have the strong feeling I would never do this regardless of the DB engine for performance and storage reasons. I would try to hash the blob into a bigint and use that as the PK/FK.

Out of curiosity, is there a scenario you can share in which using a binary blob as a PK/FK would be the best solution?


I'm not talking huge blobs, just (fixed-size) 8- to 16-byte uuid-like fields. You can obviously marshal 8-byte fields to a bigint but you can't do that with 16-byte fields.


It was interesting to hear this aside about the MySQL roadmap:

> MySQL is pushing towards INSTANT DDL

When I need MySQL these days I automatically go for MariaDB instead, but I guess they are going to diverge more and more. Does anyone more involved in the MySQL/MariaDB world have any thoughts about how they choose and the future of those two projects?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: