Just to throw additional warnings on the pile, MySQL FKs are not ANSI SQL compliant and can be set up on non-unique columns, which can be a major pain when porting across DBs.
Whoa, I've never made a foreign key on a non-unique column, had no idea MySQL would allow that. I guess I've pretty much always done foreign keys pointing to primary keys, so they're definitely unique.
> A FOREIGN KEY constraint that references a non-UNIQUE key is not standard SQL but rather an InnoDB extension.
Anyone have any background on why this exists? For what purpose would you want a non-unique FK; what are the semantics of a FK that resolves to multiple different records?
I don't know why this was implemented in MySQL's implementation but here's an example scenario where the general capability where you have a logical foreign key but not necessarily uniqueness in the pyhsical target. Pardon the jargon:
In a reporting/ROLAP dimensional data model, you may have a slowly changing dimension where you have something a table design like:
customer_id -- non-unique FK
product_id
sale_date
sold_quantity
unit_price
total_sales_amount
The customer ID in the sales table really relates back to the customer ID in the customer dimension table but not to a single UNIQUE row in the customer dimension table... unless you are, as you always are, either querying on current_record_flag='Y' (for seeing current customer attribute values), or a specific date value that falls between record_effective_date and record_end_date (for seeing customer attribute values at the time of the sales transaction_date).
(And optimal query performance for these types of analytic use case often involves using a columnar storage engine rather than InnoDB.)
Thanks for the example. I can see how a temporal/stochastic dimension can be important for some domains, but it seems like a crime to overload the relational FK semantics to support it. Just add one more layer of data abstraction (e.g. Org/Person <- temporal Customer profile <- Sale, or temporal Attributes -> Customer <- Sale), the RDBMS is made for this after all... Or use a DB with proper temporal support.
I could probably fall into this and get side-tracked for the rest of my workday, but could you explain how update/delete cascades work in a FK like this?
Updates and deletes in ROLAP dimensional data models are usually performed via explicit ETL jobs, not triggers nor particularly update/delete cascading triggers.
When a customer record is deleted in a source system, you typically do not want to delete (much less cascading delete) it from a downstream ROLAP data model because you are trying to maintain a history of past activity for reporting and/or audit purposes. Instead you might soft-delete a customer dimension table record and/or mark its effective end date as having occurred.
If a customer record is updated in a source system, depending on the design you want to implement for your use cases and history you want to retain or ignore, you may choose to update the corresponding "current" customer row in your customer dimension, or you may, for important attribute changes, update the end-date of the current customer row and insert a new second row with the latest set of customer attributes and new record effective dates (typically putting the effective end date in some distant future date).
In a standard third normal data model for an application, cascading update/deletes enable you to maintain a consistent state across a series of tables. In a dimensional data model, you typically just ensure that any dimensions get updated first, then perhaps outrigger/snowflake dimensions, then your core transactional tables. And transactional tables are rarely updated; usually you try never to touch them and you use ETL jobs to update the dimensions.
Not sure the specific MySQL thinking, but FKs in a large data model help new people understand or confirm relationships, and also make importing your data model into an ERD tool for documenting to others much easier. FKs which enforce relational integrity are always useful against bugs. Some query planners (e.g Redshift) even use FK relationships to optimize query plans even when there aren’t indexes nor constraints provided by FKs.
I am probably in the minority and would get shouted at by a DBA of yore, but I use foreign keys for referential integrity and… that’s it. We use soft deletes for almost all rows, so the whole cascading side is less relevant for us than getting an error in test or even production because a missing relation has been used. Combined with using non-integer primary keys, it’s godsend.
I remember working on a large database without foreign keys for a while. When asked why did not use foreign keys, I was told that they didn't like the opaqueness of CASCADE, which I could understand. I guess I did not give it much thought afterwards, until I later ended up in a shop, where the database did include foreign keys, but only with RESTRICT. It was eye-opening how useful foreign keys were, when they were just integrity checks.
OP. I agree! RESTRICT is by far the best rule to use, and makes the most sense.
Perhaps to balance my post a bit, and for what it's worth, I don't advocate for "don't ever use foreign keys" as a blanket statement. My experience was one where using foreign keys did not make sense. I do wish they were more operationally friendly.
Like the issues I mention in my post: modifying the data type of a column that is used by a foreign key; otherwise the fact you can't run Online DDL on a table that participates in foreign key relationship ; that INSTANT does not support (yet?) adding/removing foreign key constraints ; that cascaded writes are not written the the binary logs.
These are all things that the casual developer doesn't deal with when designing a schema and writes an app that INSERTs/DELETEs/UPDATEs to tables with foreign keys. But once there's a need for a change; once you wire 3rd party tools onto your database, that's where the operations hit a wall.
eBay released a blog/paper over two decades ago talking about how they achieved scale in part by removing FK constraints from their database. It was vogue for the next decade when SAN/DAS IOPS were ~1-3k, and FK constraints could lead to more disk writes (like Postgres multixact).
besides performance for using something that is not often tested by devs working in performance, bad support for locale collating and open you for bugs such as indexing on email but allowing application to store both upper and lower case etc.
I will take another rare opportunity of anything MySQL ends up being on HN, Considering [1] mySQL v5.x will EOL this year. And MySQL 8.0 with EOL in 2026. Does anyone knows if MySQL 9.0 will come anytime soon?
only oracle knows. and they don’t share answer yet, on this site or any other
only public news so far is extremely brief twitter mention of future switch to separate LTS releases from feature releases
big picture, hard to see what would motivate them to major re-invest in current mysql product model! amazon, planetscale, and co all profit off of oracle’s mysql server development efforts. and oracle does not get anything in return
assume this why more and more mysql dev efforts go to saas-only product like “mysql heatwave”!
Note that there's more than one 'MySQL FOREIGN KEY' implementation.
MySQL Ndb Cluster also supports foreign keys with some differences wrt the InnoDB implementation :
- NDB, therefore not limited to a single MySQL Server, shard etc
- Not limited to references between tables in a single database
- Supports NoAction deferred constraint checks
- Cascaded changes Binlogged independently as part of RBR
(Nice side effect of reducing replica apply time work)
...
innodb supports cross-schema foreign keys, there’s no limitation to tables in a single database
can be terrible in mysql 8 though due to metadata locks now extending across foreign key boundaries. this means alter in one schema can block things in other schema if foreign key across databases
speaking of, am surprised that blog post author doesnt discuss the new mysql 8 metadata locking behavior, is new major problem with mysql foreign keys!
The last time I checked, MySQL still can’t do foreign keys with binary blobs. (I think it works with some very specific limitations but not enough to actually use in the real world.)
I have the strong feeling I would never do this regardless of the DB engine for performance and storage reasons. I would try to hash the blob into a bigint and use that as the PK/FK.
Out of curiosity, is there a scenario you can share in which using a binary blob as a PK/FK would be the best solution?
I'm not talking huge blobs, just (fixed-size) 8- to 16-byte uuid-like fields. You can obviously marshal 8-byte fields to a bigint but you can't do that with 16-byte fields.
It was interesting to hear this aside about the MySQL roadmap:
> MySQL is pushing towards INSTANT DDL
When I need MySQL these days I automatically go for MariaDB instead, but I guess they are going to diverge more and more. Does anyone more involved in the MySQL/MariaDB world have any thoughts about how they choose and the future of those two projects?
https://dev.mysql.com/doc/refman/8.0/en/ansi-diff-foreign-ke...