Bi-temporal Data Modeling in Postgres with Range Types & Constraints
Beyond `updated_at`: True Temporal Modeling
As senior engineers, we've all implemented audit trails or versioning systems. The common approach involves created_at and updated_at timestamps, perhaps with a separate _history table populated by triggers. While functional, this pattern often leads to complex application logic, potential race conditions, and difficult-to-query historical states. It fundamentally fails to distinguish between when a fact was true in the real world and when the database learned about it*.
This is the critical distinction addressed by bi-temporal data modeling. It tracks two dimensions of time:
By modeling both dimensions, we can answer incredibly powerful questions like:
* "What was the customer's shipping address on July 15th, 2022?" (Querying valid time)
"What did our system think* the customer's shipping address was on July 15th, 2022, when we queried it back on August 1st, 2022?" (Querying both valid and transaction time)
This capability is indispensable in regulated industries like finance, insurance, and healthcare, but its utility extends to any system requiring robust auditability and historical accuracy. Instead of building this complex logic into our application layer, we can leverage powerful, native PostgreSQL features to enforce bi-temporal integrity at the database level. This post details a production-grade pattern using tsrange types and EXCLUDE USING GIST constraints.
Section 1: The Bi-temporal Schema Design
Our foundation will be a single table that stores both current and historical states. We'll use a canonical example: employee salaries. A salary is a fact (employee_id, amount) that is valid for a specific period.
The Core Columns
Our employee_salaries table will replace simple start_date and end_date columns with PostgreSQL's native range types. This is a crucial first step, as range types provide a more robust and queryable abstraction for time periods.
* id: A unique primary key for the row itself.
* employee_id: A foreign key to the employee this salary belongs to.
* salary_amount: The monetary value of the salary.
* valid_range: A tsrange (timestamp range) representing the Valid Time. The period during which this salary is effective in the real world.
* tx_range: A tsrange representing the Transaction Time. The period during which this row is considered the current truth in the database.
Let's look at the initial DDL:
-- Enable necessary extensions
CREATE EXTENSION IF NOT EXISTS btree_gist;
-- The main table for storing bi-temporal salary data
CREATE TABLE employee_salaries (
id BIGSERIAL PRIMARY KEY,
employee_id INT NOT NULL,
salary_amount NUMERIC(10, 2) NOT NULL,
valid_range tsrange NOT NULL,
tx_range tsrange NOT NULL
);
A key convention we'll use is that an open-ended (infinite) upper bound on a range signifies "current" or "forever." For valid_range, it means the salary is effective indefinitely until a new one is set. For tx_range, it means this database row is the current, active record.
The Declarative Integrity Constraint: EXCLUDE USING GIST
This is the heart of our pattern. We need a constraint that enforces a critical business rule: For any given employee, their valid time periods for salaries cannot overlap. However, this rule only applies to the current view of the data. Historical records (i.e., corrected mistakes) can and should overlap.
A standard UNIQUE constraint cannot handle this logic. We need a more powerful tool: an exclusion constraint.
An exclusion constraint ensures that if any two rows are compared on the specified columns or expressions using the specified operators, at least one of the operator comparisons will return false or null.
Here's the constraint that enforces our bi-temporal integrity:
ALTER TABLE employee_salaries
ADD CONSTRAINT no_overlapping_current_salaries
EXCLUDE USING GIST (
employee_id WITH =,
valid_range WITH &&
)
WHERE (upper(tx_range) IS NULL);
Let's dissect this advanced constraint:
* EXCLUDE USING GIST: This specifies that we're using a GIST index to enforce the constraint. GIST (Generalized Search Tree) indexes are capable of indexing complex data types like ranges, which is essential here.
* employee_id WITH =: This part of the constraint says "group rows by employee_id." The constraint will be checked for rows that have the same employee_id.
* valid_range WITH &&: The && operator is the "overlaps" operator for range types. This part says "check if the valid_range of any two rows in the group overlap."
* WHERE (upper(tx_range) IS NULL): This is the most critical piece. It makes the constraint partial. The exclusion logic is only applied to rows where the transaction time's upper bound is NULL (i.e., infinite). This means the constraint only applies to records that are currently considered "live" in the database. Historical records, which will have a non-null upper(tx_range), are exempt from this check.
With this single constraint, PostgreSQL now guarantees that we can never have two active salary records for the same employee covering the same period, preventing a whole class of data integrity bugs at the database level.
Section 2: Temporal Operations via Stored Procedures
Standard INSERT, UPDATE, and DELETE statements are insufficient for a bi-temporal model. An "update" might be a correction of a past mistake or the creation of a new successive record. A "delete" is never a true DELETE, but rather the logical end of a record's validity. To handle this complexity and ensure atomicity, we must encapsulate our logic in PL/pgSQL functions.
Initial Record Creation (The `INSERT`)
Creating the very first salary for an employee is straightforward. We insert a new row where both valid_range and tx_range start now and have no upper bound.
CREATE OR REPLACE FUNCTION create_initial_salary(
p_employee_id INT,
p_salary_amount NUMERIC,
p_start_date TIMESTAMPTZ DEFAULT NOW()
)
RETURNS VOID AS $$
BEGIN
INSERT INTO employee_salaries (employee_id, salary_amount, valid_range, tx_range)
VALUES (
p_employee_id,
p_salary_amount,
tsrange(p_start_date, NULL, '[)'), -- Valid from start_date onwards
tsrange(NOW(), NULL, '[)') -- Transactionally current from now onwards
);
END;
$$ LANGUAGE plpgsql;
Usage:
SELECT create_initial_salary(101, 75000.00, '2023-01-01');
Updating a Salary (Succession)
This is the most common operation: an employee gets a raise. This is not a correction; the old salary was correct for its time. We need to end the validity of the old record and create a new one.
This operation involves two steps that must be atomic:
valid_range by setting the upper bound to the effective date of the new salary.INSERT a new salary record with a valid_range starting from the effective date.Crucially, the tx_range of the old record remains open, because it was never a mistake. We are simply evolving the state of the world.
CREATE OR REPLACE FUNCTION update_salary(
p_employee_id INT,
p_new_salary_amount NUMERIC,
p_effective_date TIMESTAMPTZ
)
RETURNS VOID AS $$
DECLARE
current_record_id BIGINT;
BEGIN
-- Lock the current record to prevent race conditions
SELECT id INTO current_record_id
FROM employee_salaries
WHERE employee_id = p_employee_id
AND upper(tx_range) IS NULL
AND upper(valid_range) IS NULL
FOR UPDATE;
IF NOT FOUND THEN
RAISE EXCEPTION 'No current salary record found for employee %', p_employee_id;
END IF;
-- 1. End the valid_range of the current record
-- Note: This creates a new historical version of the row and ends the old one.
-- We'll handle this via a correction pattern for simplicity and full auditability.
-- First, we end the transaction time of the original row.
UPDATE employee_salaries
SET tx_range = tsrange(lower(tx_range), NOW(), '[)')
WHERE id = current_record_id;
-- 2. Re-insert the old record with its new, bounded valid_range
INSERT INTO employee_salaries (employee_id, salary_amount, valid_range, tx_range)
SELECT employee_id, salary_amount, tsrange(lower(valid_range), p_effective_date, '[)'), tsrange(NOW(), NULL, '[)')
FROM employee_salaries
WHERE id = current_record_id;
-- 3. Insert the new salary record
INSERT INTO employee_salaries (employee_id, salary_amount, valid_range, tx_range)
VALUES (p_employee_id, p_new_salary_amount, tsrange(p_effective_date, NULL, '[)'), tsrange(NOW(), NULL, '[)'));
END;
$$ LANGUAGE plpgsql;
This implementation is subtle. Instead of a simple UPDATE, we perform a "correction" on the old row to shorten its valid_range. This involves ending the transaction time of the original unbounded row and inserting two new current rows: one representing the old salary with its now-bounded validity, and one for the new salary. This maintains a perfect, immutable audit trail.
Correcting a Mistake (Correction)
A correction is fundamentally different from an update. It means a previous record was wrong from the start. For example, a salary was entered as $80,000 but should have been $85,000 for the same period.
- Find the incorrect, active record.
tx_range), marking it as no longer valid in the database's view of history.INSERT a new record with the corrected data but the exact same valid_range as the original. This new record's tx_range starts now.CREATE OR REPLACE FUNCTION correct_salary(
p_record_id BIGINT,
p_correct_salary_amount NUMERIC
)
RETURNS VOID AS $$
DECLARE
v_employee_id INT;
v_valid_range tsrange;
BEGIN
-- Find the record to correct and lock it
SELECT employee_id, valid_range INTO v_employee_id, v_valid_range
FROM employee_salaries
WHERE id = p_record_id
AND upper(tx_range) IS NULL
FOR UPDATE;
IF NOT FOUND THEN
RAISE EXCEPTION 'No active record found with ID % to correct', p_record_id;
END IF;
-- 1. End the transaction time of the incorrect record
UPDATE employee_salaries
SET tx_range = tsrange(lower(tx_range), NOW(), '[)')
WHERE id = p_record_id;
-- 2. Insert the new, corrected record
-- It has the same valid_range but a new tx_range starting now
INSERT INTO employee_salaries (employee_id, salary_amount, valid_range, tx_range)
VALUES (v_employee_id, p_correct_salary_amount, v_valid_range, tsrange(NOW(), NULL, '[)'));
END;
$$ LANGUAGE plpgsql;
Deleting a Record (Annulment)
A bi-temporal delete, or annulment, simply means marking a record as no longer transactionally current. We never use the DELETE statement.
CREATE OR REPLACE FUNCTION annul_salary(p_record_id BIGINT)
RETURNS VOID AS $$
BEGIN
UPDATE employee_salaries
SET tx_range = tsrange(lower(tx_range), NOW(), '[)')
WHERE id = p_record_id
AND upper(tx_range) IS NULL;
IF NOT FOUND THEN
RAISE WARNING 'No active record found with ID % to annul', p_record_id;
END IF;
END;
$$ LANGUAGE plpgsql;
Section 3: The Power of Time-Travel Queries
With our data structured this way, we can now perform powerful historical queries with surprising simplicity using the range containment operator @>.
Scenario Setup:
Let's assume the following operations occurred for employee 101:
Query 1: What is the current state of salaries?
This is the most common query. It asks for the database's current understanding of reality.
SELECT employee_id, salary_amount, lower(valid_range) as effective_from
FROM employee_salaries
WHERE upper(tx_range) IS NULL -- Only currently active records
AND valid_range @> NOW(); -- Only records valid in the real world right now
This query will correctly return that employee 101's salary is $85,000.
Query 2: What was the salary for Employee 101 on Feb 1, 2023?
This is a standard valid-time query. We look at the current truth (upper(tx_range) IS NULL) and find the record whose valid_range contained that date.
SELECT salary_amount
FROM employee_salaries
WHERE employee_id = 101
AND upper(tx_range) IS NULL
AND valid_range @> '2023-02-01'::timestamptz;
Result: $78,000.00. The system correctly reports the corrected salary, not the initial mistake.
Query 3: What did the system *think* the salary was on Feb 1, 2023, if we had asked on March 1, 2023?
This is the full bi-temporal, time-travel query. We are pinning both transaction time and valid time.
* Transaction Time: We want the state of the database as it was on 2023-03-01. This means we need records where the tx_range contains this timestamp.
* Valid Time: We want to know the salary that was effective on 2023-02-01. This means the valid_range must contain this timestamp.
SELECT salary_amount
FROM employee_salaries
WHERE employee_id = 101
AND tx_range @> '2023-03-01'::timestamptz
AND valid_range @> '2023-02-01'::timestamptz;
Result: $75,000.00. This is correct! On March 1, the correction from June 1 had not yet happened, so the system's knowledge was the original, incorrect salary.
Query 4: Show the full history of Employee 101's salary, including corrections.
We can see the entire timeline, ordering by valid time, then transaction time to see how corrections evolved.
SELECT
salary_amount,
valid_range,
tx_range
FROM employee_salaries
WHERE employee_id = 101
ORDER BY lower(valid_range), lower(tx_range);
This would yield a result like:
| salary_amount | valid_range | tx_range |
|---|---|---|
| 75000.00 | ["2023-01-01 00:00:00+00",) | ["2023-01-01 00:00:00+00", "2023-06-01 00:00:00+00") |
| 78000.00 | ["2023-01-01 00:00:00+00",) | ["2023-06-01 00:00:00+00", "2023-12-15 00:00:00+00") |
| 78000.00 | ["2023-01-01 00:00:00+00", "2024-01-01 00:00:00+00") | ["2023-12-15 00:00:00+00",) |
| 85000.00 | ["2024-01-01 00:00:00+00",) | ["2023-12-15 00:00:00+00",) |
This output perfectly captures the entire history: the initial value, its supersession by a correction, and the final succession to a new salary.
Section 4: Performance, Edge Cases, and Production Hardening
This pattern is powerful, but requires careful consideration in a production environment.
Performance and Indexing
The EXCLUDE constraint automatically creates a GIST index. This index is crucial for two things: enforcing the constraint and accelerating our time-travel queries. Let's analyze a typical query plan.
EXPLAIN ANALYZE SELECT * FROM employee_salaries WHERE employee_id = 101 AND valid_range @> '2023-02-01'::timestamptz AND upper(tx_range) IS NULL;
Index Scan using no_overlapping_current_salaries on employee_salaries ...
Index Cond: ((employee_id = 101) AND (valid_range @> '2023-02-01 00:00:00+00'::timestamp with time zone))
Filter: (upper(tx_range) IS NULL)
...
The query planner can use our GIST index on (employee_id, valid_range) to efficiently find candidate rows. The upper(tx_range) IS NULL part is applied as a filter. For optimal performance on full bi-temporal queries, a multi-column GIST index on (employee_id, valid_range, tx_range) might be beneficial, but test against your specific query patterns.
For very large datasets, partitioning the employee_salaries table (e.g., by employee_id) is a viable strategy to keep indexes small and queries fast.
Edge Case: Contiguity and Gaps
Our EXCLUDE constraint prevents overlaps but does not enforce contiguity. It's possible to create a gap in an employee's salary history. For example:
* Salary 1: Valid [2022-01-01, 2023-01-01)
* Salary 2: Valid [2023-02-01, infinity)
There is a one-month gap in valid time. If this is not permissible by business rules, you must enforce it. This can be done within the PL/pgSQL functions. Before inserting a new record, the function can check that its lower(valid_range) is equal to the upper(valid_range) of the preceding record.
Timezones and Precision
Always use TIMESTAMPTZ (timestamp with time zone). All timestamps should be stored in UTC. Application logic should handle conversions to and from local timezones. This prevents a world of pain related to daylight saving changes and distributed teams.
Be mindful of range boundary inclusivity/exclusivity ([) means inclusive-exclusive). A consistent convention is critical. [) is often preferred as it ensures two contiguous ranges [A, B) and [B, C) do not overlap at point B.
Interfacing with ORMs
This pattern can be challenging for traditional ORMs (e.g., Django ORM, TypeORM, GORM) that are built around simple UPDATE and DELETE semantics. Direct use of this model often requires bypassing the ORM's standard save/delete methods and calling your PL/pgSQL functions directly.
A common pattern is to create a service layer in your application that acts as the sole interface to the temporal table, ensuring all modifications go through the trusted stored procedures. The ORM can still be used for reads, especially for the "current state" queries.
Conclusion
By leveraging PostgreSQL's native range types and partial exclusion constraints, we can build a bi-temporal data model that is robust, auditable, and largely self-managing. The integrity of the data is enforced by the database itself, not by fallible application code. While the initial setup and the procedural wrappers for CUD operations are more complex than a standard mutable table, the payoff is immense. You gain the ability to perform precise time-travel queries, maintain a perfect and immutable audit log, and build systems with a level of data integrity that is simply not achievable with updated_at columns alone. This is a powerful pattern for any senior engineer tasked with building systems where historical accuracy is non-negotiable.