Bi-temporal Data Modeling in Postgres with Range Types & Constraints
The Shortcomings of Uni-temporal Data
As senior engineers, we've all implemented audit trails. The common pattern involves a primary data table and a corresponding _history or _audit table, populated by triggers or application logic. While functional, this approach often leads to significant query complexity, potential data drift between the live and history tables, and a nightmare for developers trying to reconstruct the state of the system at a specific point in time.
Another common pattern is using start_date and end_date columns on a single table. This is a step in the right direction, modeling what's known as Valid Time (the time period during which a fact is true in the real world). However, it fails to capture a crucial second dimension: Transaction Time (the time period during which the data was present in the database).
Consider an employee's salary. A promotion is effective from 2023-06-01. Due to a payroll processing delay, this change is only entered into the database on 2023-06-15. On 2023-07-01, we discover the salary was entered incorrectly and needs to be corrected.
With a simple valid_from/valid_to model, you have a problem. Do you overwrite the incorrect record? If you do, you lose the fact that for two weeks, the database believed the salary was something else. This loss of information is unacceptable in financial, medical, and other regulated domains. You can no longer answer the question: "What did we think John's salary was on June 10th, when we ran the report on June 12th?"
This is the problem bi-temporal modeling solves. It tracks two time dimensions:
valid_tstz): When the fact was true in the real world. (e.g., The salary was $90,000 from June 1st to Dec 31st).tx_tstz): When the database knew about this fact. (e.g., We recorded this salary fact on June 15th, and it remained the current record until it was corrected on July 1st).Implementing this correctly requires robust data integrity. You cannot have two different salaries for the same employee that are valid for the same moment in time. This is where PostgreSQL's advanced features—specifically range types and exclusion constraints—transform a complex application-layer problem into a cleanly solved database-level guarantee.
The Power of `tstzrange` and `EXCLUDE` Constraints
Instead of two separate timestamptz columns (valid_from, valid_to), PostgreSQL offers native range types. For our use case, tstzrange (timestamp with timezone range) is perfect. It encapsulates an interval of time as a single data type.
A range is defined by its lower and upper bounds. For example, '[2023-01-01 00:00:00+00, 2024-01-01 00:00:00+00)' represents the entire year of 2023. The [ means the lower bound is inclusive, and ) means the upper bound is exclusive. Unbounded ranges are represented by omitting a bound (e.g., [2023-01-01, ) means from Jan 1st, 2023 onwards).
This is more than just syntactic sugar. PostgreSQL provides a rich set of operators for range types (@> contains, && overlaps, etc.) and, critically, allows them to be indexed using GiST (Generalized Search Tree) indexes. This is the foundation for our bi-temporal integrity model.
The real magic is the Exclusion Constraint (EXCLUDE USING GIST ... WITH &&). An exclusion constraint ensures that if you compare any two rows in the table, the specified condition will not be true. For our purposes, we can declare a constraint that says "no two rows for the same employee can have valid_tstz ranges that overlap (&&)".
This is a database-level guarantee that you can never have conflicting salary records. It's more powerful and reliable than any application-level check or complex trigger logic.
Designing the Bi-temporal Schema
Let's design a table to store employee salaries bitemporally. We need to track the salary for a specific employee (employee_id), the salary amount, the valid time range, and the transaction time range.
We also need a way to link different versions of the same logical record. A common pattern is to use a separate, immutable ID for the entity (salary_id) and use the primary key (id) to uniquely identify each version (row) in the table.
-- Ensure the btree_gist extension is available for creating the exclusion constraint
-- on a combination of a scalar type and a range type.
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE TABLE employee_salaries (
-- Technical surrogate primary key for this specific row version.
id BIGSERIAL PRIMARY KEY,
-- Logical identifier for a salary record. All rows with the same salary_id
-- represent the history of a single logical salary assignment.
salary_id UUID NOT NULL DEFAULT gen_random_uuid(),
-- The employee this salary belongs to.
employee_id BIGINT NOT NULL,
-- The actual salary data.
amount NUMERIC(10, 2) NOT NULL,
-- VALID TIME: The time range when this salary was effective in the real world.
valid_tstz TSTZRANGE NOT NULL,
-- TRANSACTION TIME: The time range when this row was the current version in the database.
-- An open-ended upper bound signifies the currently active record.
tx_tstz TSTZRANGE NOT NULL,
-- THE CORE OF BI-TEMPORAL INTEGRITY:
-- This constraint ensures that for any given employee, there are no overlapping
-- valid_tstz ranges for records that are currently active in the database.
-- We check for overlaps only on rows where the transaction time is current ('infinity').
-- The WHERE clause makes this a partial constraint, which is critical for performance
-- and for allowing historical (superseded) records to have overlapping valid times
-- (e.g., when correcting a mistake).
CONSTRAINT no_overlapping_active_salaries
EXCLUDE USING GIST (
employee_id WITH =,
valid_tstz WITH &&
)
WHERE (upper(tx_tstz) IS NULL OR upper(tx_tstz) = 'infinity')
);
-- INDEXING STRATEGY
-- A GiST index is required to support the EXCLUDE constraint.
-- This index is automatically created by the CONSTRAINT declaration above.
-- This composite index is crucial for fast lookups of the current salary
-- for a specific employee at a specific point in valid time.
CREATE INDEX idx_employee_salaries_current_valid_time
ON employee_salaries USING GIST (employee_id, valid_tstz)
WHERE (upper(tx_tstz) IS NULL OR upper(tx_tstz) = 'infinity');
-- This index helps find all versions of a specific salary record, ordered by transaction time.
CREATE INDEX idx_employee_salaries_tx_history
ON employee_salaries (salary_id, tx_tstz DESC);
-- This index is for general 'as-of' queries across all employees.
CREATE INDEX idx_employee_salaries_valid_time_gist
ON employee_salaries USING GIST (valid_tstz);
Let's break down the EXCLUDE constraint. It's a partial constraint applied only to rows that are "currently active" (upper(tx_tstz) is infinity). This is the key insight. We only care about preventing overlaps for the current view of reality. The historical, superseded records (where tx_tstz is closed) are part of the audit log; their valid_tstz ranges can overlap because they represent previous, now-corrected states of the database.
Production Implementation: Bi-temporal Operations
With this schema, standard UPDATE and DELETE statements are forbidden. All modifications are handled through INSERT statements, combined with updates to existing rows to close their tx_tstz range. These operations must be atomic.
Let's define a current_timestamp() helper that we'll use for all transaction times to ensure consistency within a single transaction.
1. Creating the First Salary Record
This is the simplest operation. A new employee is hired, and their first salary is recorded.
-- Scenario: Employee 101 is hired with a salary of $80,000, effective immediately.
DO $$
DECLARE
v_now timestamptz := clock_timestamp(); -- Use clock_timestamp for real time
BEGIN
INSERT INTO employee_salaries (employee_id, amount, valid_tstz, tx_tstz)
VALUES (
101,
80000.00,
tstzrange(v_now, NULL, '[)'), -- Valid from now until further notice
tstzrange(v_now, NULL, '[)') -- Transaction is current from now
);
END $$;
After this, the table contains:
| employee_id | amount | valid_tstz | tx_tstz |
|---|---|---|---|
| 101 | 80000.00 | ["2023-10-27 10:00:00+00",) | ["2023-10-27 10:00:00+00",) |
Both ranges are open-ended, signifying this is the current state of affairs and is valid indefinitely into the future.
2. A Standard Update (e.g., a Promotion)
An "update" is a two-step process: end the current record's transaction time and insert a new one.
-- Scenario: On 2024-01-15, Employee 101 is promoted with a new salary of $95,000,
-- effective from 2024-02-01.
DO $$
DECLARE
v_now timestamptz := clock_timestamp();
v_employee_id BIGINT := 101;
v_new_amount NUMERIC(10, 2) := 95000.00;
v_effective_date timestamptz := '2024-02-01 00:00:00+00';
v_current_salary_id UUID;
BEGIN
-- Step 1: Find the current active record and its logical salary_id.
-- We need to end both its valid_tstz and tx_tstz.
-- We use FOR UPDATE to lock the row and prevent race conditions.
UPDATE employee_salaries
SET
-- The old record is now valid only up to the new effective date.
valid_tstz = tstzrange(lower(valid_tstz), v_effective_date, '[)'),
-- The old record is superseded by this transaction.
tx_tstz = tstzrange(lower(tx_tstz), v_now, '[)')
WHERE
employee_id = v_employee_id
AND upper(tx_tstz) = 'infinity'
RETURNING salary_id INTO v_current_salary_id;
-- If no row was updated, it means there's no current salary. This might be an error
-- or a new insertion. For this example, we assume a record exists.
IF NOT FOUND THEN
RAISE EXCEPTION 'No current salary found for employee %', v_employee_id;
END IF;
-- Step 2: Insert the new record.
-- It carries the same logical salary_id to link it to its history.
INSERT INTO employee_salaries (salary_id, employee_id, amount, valid_tstz, tx_tstz)
VALUES (
v_current_salary_id,
v_employee_id,
v_new_amount,
tstzrange(v_effective_date, NULL, '[)'), -- Valid from the effective date onwards
tstzrange(v_now, NULL, '[)') -- This is the current record from this transaction onwards
);
END $$;
Now the table looks like this (assuming v_now was 2024-01-15 11:30:00+00):
| salary_id | employee_id | amount | valid_tstz | tx_tstz |
|---|---|---|---|---|
| ...uuid1 | 101 | 80000.00 | ["2023-10-27 10:00:00+00","2024-02-01 00:00:00+00") | ["2023-10-27 10:00:00+00","2024-01-15 11:30:00+00") |
| ...uuid1 | 101 | 95000.00 | ["2024-02-01 00:00:00+00",) | ["2024-01-15 11:30:00+00",) |
We have a complete, non-destructive history. The first row shows the old salary, when it was valid, and when the database stopped considering it the "current" truth. The second row shows the new salary, its validity, and that it's the current truth as of our transaction time.
3. Correcting a Past Mistake (The True Power of Bi-temporality)
This is the most complex and valuable scenario.
Scenario: Today is 2024-03-01. We realize the promotion on 2024-02-01 should have been to $92,000, not $95,000. We need to correct this retroactively without losing the fact that the database contained the wrong information from 2024-01-15 until today.
-- Scenario: On 2024-03-01, we correct the salary for employee 101. The amount that was
-- effective from 2024-02-01 should have been $92,000.
DO $$
DECLARE
v_now timestamptz := clock_timestamp(); -- Transaction time is '2024-03-01 ...'
v_employee_id BIGINT := 101;
v_corrected_amount NUMERIC(10, 2) := 92000.00;
-- The valid time range we are correcting.
v_correction_valid_range tstzrange := tstzrange('2024-02-01 00:00:00+00', NULL, '[)');
v_incorrect_record_id BIGINT;
v_incorrect_record_salary_id UUID;
BEGIN
-- Step 1: Find the INCORRECT record that is currently active in the database
-- and whose valid time overlaps with the period we want to correct.
-- We lock it for our transaction.
UPDATE employee_salaries
SET
-- This record is no longer the truth as of this transaction.
tx_tstz = tstzrange(lower(tx_tstz), v_now, '[)')
WHERE
employee_id = v_employee_id
AND upper(tx_tstz) = 'infinity'
AND valid_tstz && v_correction_valid_range -- It must overlap with the validity period we're correcting
RETURNING id, salary_id INTO v_incorrect_record_id, v_incorrect_record_salary_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'No active record found for employee % during the specified valid time to correct.', v_employee_id;
END IF;
-- Step 2: Insert the NEW, CORRECTED record.
-- It has the same valid_tstz as the record it replaces.
-- Its tx_tstz starts now.
INSERT INTO employee_salaries (salary_id, employee_id, amount, valid_tstz, tx_tstz)
VALUES (
v_incorrect_record_salary_id, -- Maintain the same logical entity
v_employee_id,
v_corrected_amount,
v_correction_valid_range, -- The valid time is the same as the record we just superseded
tstzrange(v_now, NULL, '[)') -- But the transaction time starts now
);
END $$;
After this transaction on 2024-03-01, the table state is:
| id | salary_id | employee_id | amount | valid_tstz | tx_tstz |
|---|
| 1 | ...uuid1 | 101 | 80000.00 | ["2023-10-27...","2024-02-01...") | ["2023-10-27...","2024-01-15...") | (Superseded)
| 2 | ...uuid1 | 101 | 95000.00 | ["2024-02-01...",) | ["2024-01-15...","2024-03-01...") | (Superseded by correction)
| 3 | ...uuid1 | 101 | 92000.00 | ["2024-02-01...",) | ["2024-03-01...",) | (Current Truth)
2024-01-15.2024-01-15 and 2024-03-01.2024-02-01 onwards, and it became the database's truth on 2024-03-01.We have lost zero information. We can reconstruct the state of the world and the state of the database at any point in time.
Advanced Query Patterns
Now, let's reap the rewards with queries that are trivial with this model and nearly impossible otherwise.
Query 1: "As-of" Query (What is the current truth?)
"What was Employee 101's salary on 2024-02-15?"
This queries against Valid Time for currently active records.
SELECT amount
FROM employee_salaries
WHERE
employee_id = 101
AND upper(tx_tstz) = 'infinity' -- We want the current truth
AND valid_tstz @> '2024-02-15 00:00:00+00'::timestamptz; -- The valid range contains this timestamp
-- Result: 92000.00
Query 2: "As-was" Query (What did the database think in the past?)
"On 2024-02-20, what did we think Employee 101's salary was for the date 2024-02-15?"
This is the quintessential bi-temporal query. It queries against both Valid Time and Transaction Time.
SELECT amount
FROM employee_salaries
WHERE
employee_id = 101
-- The database record must have existed at this point in transaction time
AND tx_tstz @> '2024-02-20 00:00:00+00'::timestamptz
-- And the record must have been valid for this point in real-world time
AND valid_tstz @> '2024-02-15 00:00:00+00'::timestamptz;
-- Result: 95000.00
This correctly returns the incorrect salary, because on 2024-02-20, the correction had not yet been made.
Query 3: Full History of a Logical Record
"Show me the entire history of changes for Employee 101's salary that started on 2023-10-27."
SELECT
amount,
lower(valid_tstz) as valid_from,
upper(valid_tstz) as valid_to,
lower(tx_tstz) as recorded_at,
upper(tx_tstz) as superseded_at
FROM employee_salaries
WHERE salary_id = (SELECT salary_id FROM employee_salaries WHERE id = 1) -- Get the logical ID from any version
ORDER BY lower(tx_tstz);
This query gives you a perfect, ordered audit log of every state this salary record has ever been in.
Performance, Edge Cases, and Final Considerations
EXCLUDE constraint and for making range queries performant. Without them, queries involving && or @> will result in full table scans. The partial index on active records is especially important as the history grows.timestamptz and tstzrange. Storing timezone-naive timestamps is a recipe for disaster in distributed systems. Ensure your application and database servers have consistent timezone configurations.UPDATE and INSERT pairs for modifications must occur within a single transaction. The DO blocks in the examples ensure this. In an application, this would be handled by your transaction management framework.EXCLUDE constraint only prevents overlaps. If you need to enforce contiguity, you'd typically handle that in the application logic that performs the bi-temporal update, for example by checking that the new valid_from date is exactly equal to the old valid_to date.infinity Gotcha: The concept of an infinite upper bound is represented as NULL in the upper() function for ranges. When writing WHERE clauses, it's safer to check for both upper(tx_tstz) IS NULL and upper(tx_tstz) = 'infinity' to be robust, though modern Postgres versions are more consistent. The partial index WHERE clause reflects this.DELETE operations are an anti-pattern in this model. If a salary was entered for the wrong employee entirely, you would perform a bi-temporal "end-date" operation, setting the upper(valid_tstz) to lower(valid_tstz), effectively making its valid period empty. This preserves the audit trail of the erroneous entry and its correction.By embracing PostgreSQL's native range types and exclusion constraints, you can build exceptionally robust, auditable systems. The logic for maintaining historical accuracy is moved from complex, error-prone application code into declarative, database-enforced integrity. This pattern provides a solid foundation for any system where understanding the state of data through time—both in the real world and within the database itself—is a non-negotiable requirement.