dbt Constraints: Automatic Primary Keys, Unique Keys, and Foreign Keys for Snowflake

dbt Constraints is a new package that generates database constraints based on the tests in a dbt project. Based on the very positive feedback from initial users, I believe that most Snowflake customers should consider adding it to their projects.

I wrote dbt Constraints because I recognized the need for a package that would automatically create referential constraints. Most dbt users are already identifying unique keys and foreign keys in their dbt tests but are not adding these constraints to Snowflake. The package is so simple that you only have to enable it in your packages.yml and it will scan your existing tests to create constraints.

I have released this project through Snowflake’s open source community on Github, Snowflake-Labs, and it is available through the dbt Hub. It is currently compatible with Snowflake and PostgreSQL only.

Why data engineers should add referential integrity constraints

The primary reason to add constraints to your database tables is that many tools including DBeaver and Oracle SQL Developer Data Modeler can correctly reverse-engineer data model diagrams if there are primary keys, unique keys, and foreign keys on tables. Most BI tools will also add joins automatically between tables when you import tables that have foreign keys. This can both save time and avoid mistakes.

In addition, although Snowflake doesn’t enforce most constraints, the query optimizer can consider primary key, unique key, and foreign key constraints during query rewrite if the constraint is set to RELY. Since dbt can test that the data in the table complies with the constraints, this package creates constraints on Snowflake with the RELY property to improve query performance.

Many other databases including PostgreSQL, SQL Server, Oracle, MySQL, and DB2 can use referential integrity constraints to perform “join elimination” to remove tables from an execution plan. This commonly occurs when you query a subset of columns from a view and some of the tables in the view are unnecessary. Even on databases that do not support join elimination, some BI and visualization tools will also rewrite their queries based on constraint information, producing the same effect.

Finally, although most columnar databases including Snowflake do not use or need indexes, most row-oriented databases including PostgreSQL require indexes on their primary key columns in order to perform efficient joins between tables. Typically a primary key or unique key constraint is enforced on such databases using such indexes. Having dbt create the unique indexes automatically can slightly reduce the degree of performance tuning necessary for row-oriented databases. Row-oriented databases frequently also need indexes on foreign key columns but that is something best added manually.

Please note

When you add this package, dbt will automatically begin to create unique keys for all your existing unique and dbt_utils.unique_combination_of_columns tests and foreign keys for existing relationship tests. The package also provides three new tests (primary_key, unique_key, and foreign_key) that are a bit more flexible than the standard dbt tests. These tests can be used inline, out-of-line, and can support multiple columns when used in the tests: section of a model.

Disabling automatic constraint generation

The dbt_constraints_enabled variable can be set to false in your project to disable automatic constraint generation. By default dbt Constraints only creates constraints on models. To allow constraints on sources, you can set dbt_constraints_sources_enabled to true. The package will verify that you have sufficient database privileges to create constraints on sources.

vars:
# The package can be temporarily disabled using this variable
dbt_constraints_enabled: true
# The package can also add constraints on sources if you have sufficient privileges
dbt_constraints_sources_enabled: false
# You can also be specific on which constraints are enabled for sources
# You must also enable dbt_constraints_sources_enabled above
dbt_constraints_sources_pk_enabled: true
dbt_constraints_sources_uk_enabled: true
dbt_constraints_sources_fk_enabled: true

Installation

packages:
- package: Snowflake-Labs/dbt_constraints
version: 0.3.0
  • Run dbt deps.
  • Optionally add primary_key, unique_key, or foreign_key tests to your model like the following examples.
- name: DIM_ORDER_LINES
columns:
# Single column inline constraints
- name: OL_PK
tests:
- dbt_constraints.primary_key
- name: OL_UK
tests:
- dbt_constraints.unique_key
- name: OL_CUSTKEY
tests:
- dbt_constraints.foreign_key:
pk_table_name: ref('DIM_CUSTOMERS')
pk_column_name: C_CUSTKEY
tests:
# Single column constraints
- dbt_constraints.primary_key:
column_name: OL_PK
- dbt_constraints.unique_key:
column_name: OL_ORDERKEY
- dbt_constraints.foreign_key:
fk_column_name: OL_CUSTKEY
pk_table_name: ref('DIM_CUSTOMERS')
pk_column_name: C_CUSTKEY
# Multiple column constraints
- dbt_constraints.primary_key:
column_names:
- OL_PK_COLUMN_1
- OL_PK_COLUMN_2
- dbt_constraints.unique_key:
column_names:
- OL_UK_COLUMN_1
- OL_UK_COLUMN_2
- dbt_constraints.foreign_key:
fk_column_names:
- OL_FK_COLUMN_1
- OL_FK_COLUMN_2
pk_table_name: ref('DIM_CUSTOMERS')
pk_column_names:
- C_PK_COLUMN_1
- C_PK_COLUMN_2

dbt_constraints Limitations

Generally, if you don’t meet a requirement, tests are still executed but the constraint is skipped rather than producing an error.

  • All models involved in a constraint must be materialized as table, incremental, or snapshot.
  • If source constraints are enabled, the source must be a table. You must also have the OWNERSHIP table privilege to add a constraint. For foreign keys you also need the REFERENCES privilege on the parent table with the primary or unique key.
  • All columns on constraints must be individual column names, not expressions. You can reference columns on a model that come from an expression.
  • Constraints are not created for failed tests
  • primary_key, unique_key, and foreign_key tests are considered first and duplicate constraints are skipped. One exception is that you will get an error if you add two different primary_key tests to the same model.
  • Foreign keys require that the parent table have a primary key or unique key on the referenced columns. Unique keys generated from standard unique tests are sufficient.
  • The order of columns on a foreign key test must match between the FK columns and PK columns
  • Consistent with ANSI SQL 92, the foreign_key test will ignore any rows with a null column, even if only one of two columns in a compound key is null. If you also want to ensure FK columns are not null, you should add standard not_null tests to your model.

Legal

This is a community-developed package, not an official Snowflake offering. It comes with no support or warranty. However, feel free to raise a github issue if you find a bug or would like a new feature.

Licensed under the Apache License, Version 2.0

--

--

Dan Flippo
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Sr Solutions Architect @Snowflake; SME: dbt, Kafka, Oracle BI & DB, StreamSets, Vertica. Views & opinions are my own and do not represent those of my employer.