> A FOREIGN KEY constraint that references a non-UNIQUE key is not standard SQL ...

gregw2 · on May 1, 2023

I don't know why this was implemented in MySQL's implementation but here's an example scenario where the general capability where you have a logical foreign key but not necessarily uniqueness in the pyhsical target. Pardon the jargon:

In a reporting/ROLAP dimensional data model, you may have a slowly changing dimension where you have something a table design like:

  customer_id (varchar(N)),
  current_record_flag boolean, 
  record_effective_start_date, 
  record_effective_end_date, 
  attribute1, 
 attribute2, etc

and a sales transaction table:

  customer_id -- non-unique FK
  product_id
  sale_date 
  sold_quantity
  unit_price
  total_sales_amount

The customer ID in the sales table really relates back to the customer ID in the customer dimension table but not to a single UNIQUE row in the customer dimension table... unless you are, as you always are, either querying on current_record_flag='Y' (for seeing current customer attribute values), or a specific date value that falls between record_effective_date and record_end_date (for seeing customer attribute values at the time of the sales transaction_date).

(And optimal query performance for these types of analytic use case often involves using a columnar storage engine rather than InnoDB.)

pphysch · on May 1, 2023

Thanks for the example. I can see how a temporal/stochastic dimension can be important for some domains, but it seems like a crime to overload the relational FK semantics to support it. Just add one more layer of data abstraction (e.g. Org/Person <- temporal Customer profile <- Sale, or temporal Attributes -> Customer <- Sale), the RDBMS is made for this after all... Or use a DB with proper temporal support.

noduerme · on May 1, 2023

I could probably fall into this and get side-tracked for the rest of my workday, but could you explain how update/delete cascades work in a FK like this?

gregw2 · on May 2, 2023

Updates and deletes in ROLAP dimensional data models are usually performed via explicit ETL jobs, not triggers nor particularly update/delete cascading triggers.

When a customer record is deleted in a source system, you typically do not want to delete (much less cascading delete) it from a downstream ROLAP data model because you are trying to maintain a history of past activity for reporting and/or audit purposes. Instead you might soft-delete a customer dimension table record and/or mark its effective end date as having occurred.

If a customer record is updated in a source system, depending on the design you want to implement for your use cases and history you want to retain or ignore, you may choose to update the corresponding "current" customer row in your customer dimension, or you may, for important attribute changes, update the end-date of the current customer row and insert a new second row with the latest set of customer attributes and new record effective dates (typically putting the effective end date in some distant future date).

In a standard third normal data model for an application, cascading update/deletes enable you to maintain a consistent state across a series of tables. In a dimensional data model, you typically just ensure that any dimensions get updated first, then perhaps outrigger/snowflake dimensions, then your core transactional tables. And transactional tables are rarely updated; usually you try never to touch them and you use ETL jobs to update the dimensions.

There are different styles/techniques for different use cases. At the risk of side-tracking you, a couple random points of reference: https://www.kimballgroup.com/data-warehouse-business-intelli... https://www.kimballgroup.com/2008/08/slowly-changing-dimensi... https://www.kimballgroup.com/2013/02/design-tip-152-slowly-c...

noduerme · on May 2, 2023

Sorry if this is an absurdly simple question... What is the point of a foreign key that has no cascade? Just to help speed up indexing?

gregw2 · on May 2, 2023

Not sure the specific MySQL thinking, but FKs in a large data model help new people understand or confirm relationships, and also make importing your data model into an ERD tool for documenting to others much easier. FKs which enforce relational integrity are always useful against bugs. Some query planners (e.g Redshift) even use FK relationships to optimize query plans even when there aren’t indexes nor constraints provided by FKs.

dragonwriter · on May 1, 2023

Its a shortcut to support use of a denornalized schema where, in a properly normalized schema, there’d be abother table where the row was unique.

dllthomas · on May 1, 2023

Always use a nornalized schema. The ability to see the future is invaluable in a database.