Postgres Bi-temporal Models with Range Types & Exclusion Constraints
The Illusion of History: Why `updated_at` Fails Us
As senior engineers, we've all implemented soft deletes and basic versioning. A simple updated_at timestamp and an is_deleted flag are standard practice. They tell a story, but it's an incomplete and often misleading one. They record when the database row was last mutated, not the state of the business reality it represents.
Consider a simple employee_salaries table. If an admin corrects a payroll error on February 5th, backdating a salary increase that was effective from January 1st, an updated_at column only tells you about the event on February 5th. It erases the previous, incorrect state. You can no longer answer a critical question: "What did our system believe an employee's salary was on January 15th, when we ran payroll?"
This is the fundamental limitation of uni-temporal data models. To achieve true auditability and reproducible history, we must track time across two distinct axes:
Modeling data with both axes is called bi-temporality. It transforms your database from a snapshot of the present into a complete, queryable history of both reality and the system's understanding of it. While this might seem complex, modern PostgreSQL provides powerful, native tools—specifically range types and exclusion constraints—that make this pattern robust and surprisingly elegant.
This article is not an introduction. We will dive directly into building a production-grade bi-temporal model, enforcing its integrity at the database level, and managing its lifecycle with advanced SQL.
Core Schema: Leveraging `TSTZRANGE` for Temporal Atomicity
Our foundation is a table structure that explicitly models the two time axes. We'll use our employee_salaries example. A naive approach might use four TIMESTAMPTZ columns (valid_from, valid_to, transaction_from, transaction_to). This is a mistake. It complicates queries, invites off-by-one errors in intervals, and makes integrity constraints difficult to write.
The correct tool for the job is PostgreSQL's native TSTZRANGE type. It stores a range of timestamps with timezones as a single, atomic unit. This simplifies everything from querying to constraint definition.
Here is our core table structure:
-- Ensure the btree_gist extension is available for our constraints
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE TABLE employee_salaries (
-- A surrogate key for the versioned row itself
salary_version_id BIGSERIAL PRIMARY KEY,
-- The natural business key for the entity being versioned
employee_id BIGINT NOT NULL,
-- The actual data payload
salary_amount NUMERIC(10, 2) NOT NULL,
currency CHAR(3) NOT NULL,
-- VALID TIME: When this salary was effective in the real world.
-- '[)' indicates an inclusive lower bound and exclusive upper bound.
valid_range TSTZRANGE NOT NULL,
-- TRANSACTION TIME: When this version of the record was present in the database.
-- An open-ended upper bound ('infinity') signifies the currently active version.
transaction_range TSTZRANGE NOT NULL
);
Key Design Decisions:
* salary_version_id vs employee_id: employee_id identifies the employee, but it's not unique in this table. Each change to an employee's salary creates a new row. salary_version_id is the surrogate primary key that uniquely identifies a specific historical version of a salary record.
* TSTZRANGE: We use this for both valid_range and transaction_range. The [) notation (inclusive-exclusive) is a common convention that simplifies reasoning about adjacent time periods. A salary valid for all of January would be ['2023-01-01', '2023-02-01').
* infinity: An unbounded upper range indicates that the record is currently valid (for valid_range) or is the latest known version in the system (for transaction_range).
Enforcing Temporal Integrity with `EXCLUDE` Constraints
This is where the true power of PostgreSQL comes into play. We can enforce complex business rules about time directly in the schema, preventing data corruption at the source. A UNIQUE constraint checks for equality; an EXCLUDE constraint checks for overlapping values, which is exactly what we need for time ranges.
Our primary business rule is: For any given employee, their salary history cannot have overlapping periods of validity. A person cannot have two different salaries at the same time.
We enforce this with a GIST-indexed EXCLUDE constraint:
ALTER TABLE employee_salaries
ADD CONSTRAINT employee_salaries_valid_range_no_overlap
EXCLUDE USING GIST (employee_id WITH =, valid_range WITH &&);
Let's break this down:
* EXCLUDE USING GIST: Specifies we are using a GiST (Generalized Search Tree) index, which is capable of handling complex data types like ranges and the operators associated with them.
* employee_id WITH =: The constraint applies to groups of rows with the same employee_id. The = operator is the standard equality check.
* valid_range WITH &&: This is the core logic. For any two rows within the employee_id group, this constraint fails if their valid_range values overlap. The && operator is the PostgreSQL "overlaps" operator for range types.
With this single constraint, the database now makes it impossible to insert inconsistent data. For example, if we have a salary valid for ['2023-01-01', '2024-01-01'), any attempt to insert another salary for the same employee that overlaps with this period will result in a constraint violation.
A note on transaction time: We don't typically place an exclusion constraint on transaction_range because our application logic, which we'll define next, will ensure its integrity by design. The logic ensures that for any given state of valid time, only one transaction time range is ever "current" (i.e., has an upper bound of infinity).
The New CRUD: Immutable Operations via PL/pgSQL
Standard INSERT, UPDATE, and DELETE statements are now dangerous. An UPDATE would modify history, and a DELETE would erase it. Instead, we manage data lifecycle through a set of stored procedures that encapsulate the rules of bi-temporality. This ensures consistency and abstracts the complexity away from the application layer.
All operations become a combination of INSERTs and UPDATEs that only ever modify the upper bounds of ranges.
1. Creating a New Salary Record
This is the simplest operation. A new salary is created with a valid_range starting from a specified time (or NOW()) and extending to infinity. Its transaction_range also starts now and extends to infinity, signifying it's the current, known truth.
CREATE OR REPLACE FUNCTION create_new_salary(
p_employee_id BIGINT,
p_salary_amount NUMERIC,
p_currency CHAR(3),
p_valid_from TIMESTAMPTZ DEFAULT NOW()
)
RETURNS VOID AS $$
BEGIN
-- Ensure no other salary is currently active for this employee
IF EXISTS (
SELECT 1 FROM employee_salaries
WHERE employee_id = p_employee_id AND upper(valid_range) = 'infinity'
) THEN
RAISE EXCEPTION 'Employee already has an active salary. Use update_salary() instead.';
END IF;
INSERT INTO employee_salaries(
employee_id,
salary_amount,
currency,
valid_range,
transaction_range
) VALUES (
p_employee_id,
p_salary_amount,
p_currency,
tstzrange(p_valid_from, 'infinity', '[)'),
tstzrange(NOW(), 'infinity', '[)')
);
END;
$$ LANGUAGE plpgsql;
2. Updating a Salary (A Real-World Change)
This handles a promotion or annual salary adjustment. This is not an UPDATE in the SQL sense. It involves two steps atomically executed within a function:
valid_range's upper bound from 'infinity' to NOW().valid_range starting from NOW() to 'infinity'.CREATE OR REPLACE FUNCTION update_salary(
p_employee_id BIGINT,
p_new_salary_amount NUMERIC,
p_effective_date TIMESTAMPTZ DEFAULT NOW()
)
RETURNS VOID AS $$
DECLARE
current_currency CHAR(3);
BEGIN
-- 1. End the validity of the current salary record
UPDATE employee_salaries
SET valid_range = tstzrange(lower(valid_range), p_effective_date, '[)')
WHERE employee_id = p_employee_id
AND upper(valid_range) = 'infinity'
AND upper(transaction_range) = 'infinity'
RETURNING currency INTO current_currency;
IF NOT FOUND THEN
RAISE EXCEPTION 'No active salary found for employee_id % to update.', p_employee_id;
END IF;
-- 2. Insert the new salary record
INSERT INTO employee_salaries(
employee_id,
salary_amount,
currency,
valid_range,
transaction_range
) VALUES (
p_employee_id,
p_new_salary_amount,
current_currency, -- Assuming currency doesn't change
tstzrange(p_effective_date, 'infinity', '[)'),
tstzrange(NOW(), 'infinity', '[)')
);
END;
$$ LANGUAGE plpgsql;
3. Deleting a Salary (A Logical End)
We never use DELETE. To "delete" a salary (e.g., an employee leaves), we simply end its validity. This is identical to the first step of the update_salary function.
CREATE OR REPLACE FUNCTION end_salary(
p_employee_id BIGINT,
p_end_date TIMESTAMPTZ DEFAULT NOW()
)
RETURNS VOID AS $$
BEGIN
UPDATE employee_salaries
SET valid_range = tstzrange(lower(valid_range), p_end_date, '[)')
WHERE employee_id = p_employee_id
AND upper(valid_range) = 'infinity'
AND upper(transaction_range) = 'infinity';
IF NOT FOUND THEN
RAISE EXCEPTION 'No active salary found for employee_id % to end.', p_employee_id;
END IF;
END;
$$ LANGUAGE plpgsql;
4. Correcting a Historical Record
This is the most complex and powerful operation. It's used when we discover a past record was wrong. We need to supersede the incorrect record in transaction time without altering the valid time ranges.
transaction_range's upper bound to NOW().valid_range will be identical to the old, incorrect record's valid_range. Its transaction_range will start from NOW().CREATE OR REPLACE FUNCTION correct_historical_salary(
p_salary_version_id BIGINT,
p_corrected_amount NUMERIC
)
RETURNS VOID AS $$
DECLARE
old_record employee_salaries%ROWTYPE;
BEGIN
-- Lock the row to prevent concurrent modifications
SELECT * INTO old_record FROM employee_salaries WHERE salary_version_id = p_salary_version_id FOR UPDATE;
IF NOT FOUND THEN
RAISE EXCEPTION 'Salary version ID % not found.', p_salary_version_id;
END IF;
-- 1. Supersede the old record in transaction time
UPDATE employee_salaries
SET transaction_range = tstzrange(lower(transaction_range), NOW(), '[)')
WHERE salary_version_id = p_salary_version_id;
-- 2. Insert the new, corrected record
INSERT INTO employee_salaries(
employee_id,
salary_amount,
currency,
valid_range, -- The valid range is preserved
transaction_range -- The new transaction range starts now
) VALUES (
old_record.employee_id,
p_corrected_amount,
old_record.currency,
old_record.valid_range,
tstzrange(NOW(), 'infinity', '[)')
);
END;
$$ LANGUAGE plpgsql;
The Payoff: Powerful Bi-temporal Querying
With our data structured correctly, we can now answer those previously impossible questions with remarkable ease. The @> (contains) operator for ranges is our primary tool.
Scenario Data:
Let's assume an employee (ID: 101) has the following history:
- Hired on 2022-06-01 with a salary of $80,000.
- Gets a raise to $90,000 effective 2023-01-01.
- On 2023-02-15, an admin realizes the initial salary was a typo and should have been $82,000. They correct it.
Query 1: What is the employee's current salary? (As of today)
This is the simplest query, looking for the record that is currently valid and currently known to the system.
SELECT salary_amount
FROM employee_salaries
WHERE employee_id = 101
AND valid_range @> NOW()
AND transaction_range @> NOW();
-- Result: 90000.00
Query 2: What was the employee's salary on 2022-10-01? (As-of query)
We are asking about a past state of reality, based on our current knowledge.
SELECT salary_amount
FROM employee_salaries
WHERE employee_id = 101
AND valid_range @> '2022-10-01'::timestamptz
AND transaction_range @> NOW();
-- Result: 82000.00 (the corrected amount)
Query 3: What did we think the employee's salary was on 2022-10-01, when we ran payroll on 2022-11-01? (As-at query)
This is the holy grail of bi-temporality. We are traveling back in time to query the database as it existed on a previous date.
SELECT salary_amount
FROM employee_salaries
WHERE employee_id = 101
AND valid_range @> '2022-10-01'::timestamptz
AND transaction_range @> '2022-11-01'::timestamptz;
-- Result: 80000.00 (the original, incorrect amount)
This query correctly retrieves the value that was considered true before the correction was made on 2023-02-15. This is essential for auditing, debugging, and re-running historical reports with perfect fidelity.
Query 4: Show the full history of salary validity for an employee.
This shows the evolution of the employee's salary over time.
SELECT salary_amount, valid_range
FROM employee_salaries
WHERE employee_id = 101
AND transaction_range @> NOW()
ORDER BY lower(valid_range);
/* Results:
82000.00 | ["2022-06-01 00:00:00+00", "2023-01-01 00:00:00+00")
90000.00 | ["2023-01-01 00:00:00+00", infinity)
*/
Performance, Indexing, and Production Considerations
A bi-temporal table will grow much faster than a standard one. Performance is a primary concern.
* Indexing: The EXCLUDE constraint already created a crucial multi-column GiST index: (employee_id, valid_range). This index will accelerate any query that filters by employee_id and uses the && (overlaps) or @> (contains) operators on valid_range.
* Query-Specific Indexes: The default GiST index is not a one-size-fits-all solution. For our common queries that check both ranges, a more comprehensive index is needed:
CREATE INDEX idx_employee_salaries_bitemporal_query
ON employee_salaries USING GIST (employee_id, valid_range, transaction_range);
This allows the planner to efficiently narrow down results by employee_id and then apply range conditions on both time axes.
* Table Bloat and Partitioning: Since we never DELETE, the table will grow indefinitely. This leads to performance degradation over time. For high-volume tables, partitioning is not just an option; it's a requirement.
A common strategy is to partition by employee_id (e.g., HASH partitioning) or by a time dimension. Partitioning by lower(valid_range) (RANGE partitioning) can be effective for archiving old data.
-- Example of range partitioning by year of validity
CREATE TABLE employee_salaries_partitioned (
-- ... columns ...
) PARTITION BY RANGE (lower(valid_range));
CREATE TABLE employee_salaries_2022 PARTITION OF employee_salaries_partitioned
FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');
-- etc.
Advanced Edge Cases and Solutions
Edge Case 1: Inserting into the Middle of History (Backdating)
What if a salary change was missed and needs to be inserted retroactively? For example, a promotion on March 1st was only entered into the system on April 15th. This requires splitting an existing record.
The operation must:
valid_range.valid_range to end where the new one begins.valid_range.valid_range starting where the new one ends.This logic is complex and is best encapsulated in a sophisticated stored procedure.
-- A simplified example of a backdating function
CREATE OR REPLACE FUNCTION backdate_salary_change(
p_employee_id BIGINT,
p_new_salary_amount NUMERIC,
p_valid_from TIMESTAMPTZ,
p_valid_to TIMESTAMPTZ
)
RETURNS VOID AS $$
DECLARE
target_record employee_salaries%ROWTYPE;
BEGIN
-- Find and lock the record we are splitting
SELECT * INTO target_record
FROM employee_salaries
WHERE employee_id = p_employee_id
AND valid_range @> p_valid_from
AND transaction_range @> NOW()
FOR UPDATE;
IF NOT FOUND THEN
RAISE EXCEPTION 'No salary record found to split at date %', p_valid_from;
END IF;
-- 1. Supersede the entire old record in transaction time
UPDATE employee_salaries
SET transaction_range = tstzrange(lower(transaction_range), NOW(), '[)')
WHERE salary_version_id = target_record.salary_version_id;
-- 2. Re-insert the first part of the split record
INSERT INTO employee_salaries(employee_id, salary_amount, currency, valid_range, transaction_range)
VALUES (target_record.employee_id, target_record.salary_amount, target_record.currency,
tstzrange(lower(target_record.valid_range), p_valid_from, '[)'),
tstzrange(NOW(), 'infinity', '[)'));
-- 3. Insert the new, back-dated record
INSERT INTO employee_salaries(employee_id, salary_amount, currency, valid_range, transaction_range)
VALUES (p_employee_id, p_new_salary_amount, target_record.currency,
tstzrange(p_valid_from, p_valid_to, '[)'),
tstzrange(NOW(), 'infinity', '[)'));
-- 4. Re-insert the last part of the split record, if necessary
IF upper(target_record.valid_range) > p_valid_to THEN
INSERT INTO employee_salaries(employee_id, salary_amount, currency, valid_range, transaction_range)
VALUES (target_record.employee_id, target_record.salary_amount, target_record.currency,
tstzrange(p_valid_to, upper(target_record.valid_range), '[)'),
tstzrange(NOW(), 'infinity', '[)'));
END IF;
END;
$$ LANGUAGE plpgsql;
Edge Case 2: Integration with ORMs
Object-Relational Mappers often struggle with this pattern. They are typically designed around mutable entities and simple primary keys. TSTZRANGE may not be a native type, and the concept of immutable, function-based writes goes against the grain of typical repository.save(entity) patterns.
Solutions:
* Views: Create a view current_employee_salaries that only shows the currently valid and current transaction records. The ORM can interact with this view for read operations.
CREATE VIEW current_employee_salaries AS
SELECT * FROM employee_salaries
WHERE valid_range @> NOW() AND transaction_range @> NOW();
* Native Queries: For all write operations, bypass the ORM's entity management and call the PL/pgSQL functions directly using native query capabilities.
* Database-First Approach: Model the bi-temporal logic entirely in the database with functions and triggers. The application's only responsibility is to call these functions, treating the database as a true service layer.
Conclusion: The Cost and Reward of Perfect History
Implementing a bi-temporal data model is a significant architectural decision. It introduces complexity into your write paths and increases storage requirements. It is not a pattern to be applied universally across your application.
However, for core business entities where auditability, compliance, and historical reporting are paramount—financial transactions, insurance policies, clinical trial data, regulatory submissions—the benefits are immense. You gain:
* Total Auditability: Every state of the system is preserved and queryable.
* Reproducibility: Reports run for a past date will always return the same result, reflecting what the system knew at that time.
* Error Correction without Data Loss: Mistakes can be corrected while preserving the history of the mistake itself, which is often a critical piece of information.
By leveraging PostgreSQL's advanced features like TSTZRANGE and EXCLUDE constraints, you can build a robust, database-enforced bi-temporal system that moves beyond simple versioning and provides a truly complete historical record.