Bi-temporal Data Models in Postgres with Range Types & GIST
The Flaw in Conventional Timestamps
As senior engineers, we've all implemented schemas with created_at and updated_at columns. They serve a basic purpose: tracking the lifecycle of a row within the database. However, this approach conflates two fundamentally different temporal axes:
Failing to distinguish between these leads to destructive updates. When you UPDATE employees SET salary = 90000 WHERE id = 123;, you overwrite the previous salary. The history of that employee's compensation is lost unless you've implemented a separate, often cumbersome, audit log table. You can answer "What is the employee's salary now?" but you cannot reliably answer "What was the employee's salary on June 1st, 2022?" and you certainly cannot answer "On August 1st, 2023, what did we think the employee's salary was on June 1st, 2022?"
This latter question is the hallmark of bi-temporality and is critical for financial reporting, regulatory compliance, and debugging complex business logic errors. This article presents a robust, production-grade pattern for implementing a bi-temporal model using advanced, native features of PostgreSQL: tsrange types, exclusion constraints, and GIST indexes.
Core Architecture: Valid and Transaction Time Ranges
Our model will replace single timestamp columns with two timestamp ranges (tsrange) for every bi-temporal entity.
* valid_range tsrange: Represents the Valid Time. It's a [lower_bound, upper_bound) interval describing when the record's data was true in reality. The upper bound is often set to 'infinity' for currently valid records.
* tx_range tsrange: Represents the Transaction Time. It's a [lower_bound, upper_bound) interval describing when this version of the record was present in the database. The upper bound is 'infinity' for the latest version of any given fact.
Let's model an employee's salary. A single row in our employee_salaries table won't just represent a salary; it will represent a salary over a specific period of validity, as known to the system during a specific period of transaction time.
Schema Definition with Exclusion Constraints
The real power of this pattern in PostgreSQL comes from using an exclusion constraint with a GIST index. This allows the database itself to enforce the core business rule of bi-temporality: for a given entity (e.g., an employee), their valid time ranges cannot overlap for records that are currently active in the database.
Here is the schema for our employee_salaries table:
-- Ensure the btree_gist extension is available for multicolumn indexes
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE TABLE employee_salaries (
-- A surrogate primary key for the row itself.
id BIGSERIAL PRIMARY KEY,
-- A business key to identify the entity (the employee).
-- This is what we'll use for temporal integrity checks.
employee_id BIGINT NOT NULL,
-- The actual data we are tracking.
salary NUMERIC(10, 2) NOT NULL,
-- Bi-temporal timestamp ranges.
-- We use TIMESTAMPTZ for timezone safety.
valid_range TSTZRANGE NOT NULL,
tx_range TSTZRANGE NOT NULL,
-- THE CRITICAL CONSTRAINT
-- This prevents overlapping valid_range for the same employee_id
-- ONLY for records that are currently active (tx_range @> 'infinity').
-- The `WHERE` clause makes this a partial constraint, which is key.
CONSTRAINT no_overlapping_valid_ranges_for_current_transactions
EXCLUDE USING GIST (
employee_id WITH =,
valid_range WITH &&
) WHERE (UPPER(tx_range) = 'infinity')
);
-- Create indexes to accelerate temporal queries
CREATE INDEX idx_employee_salaries_valid_range_gist ON employee_salaries USING GIST (valid_range);
CREATE INDEX idx_employee_salaries_tx_range_gist ON employee_salaries USING GIST (tx_range);
CREATE INDEX idx_employee_salaries_employee_id ON employee_salaries (employee_id);
Let's dissect the exclusion constraint:
* EXCLUDE USING GIST: This type of constraint is only possible with index types that support the && (overlap) operator, like GIST.
* employee_id WITH =: This specifies that the constraint applies to rows with the same employee_id. The = operator checks for equality.
* valid_range WITH &&: This checks if the valid_range of a new or updated row overlaps with an existing row's valid_range. The && is the overlap operator for range types.
WHERE (UPPER(tx_range) = 'infinity'): This is the most subtle and important part. The constraint is partial*. It only applies to rows that are currently considered active in the database (their transaction time extends to infinity). This allows us to have historical records with overlapping valid times, representing corrections to history, without violating integrity.
Bi-temporal Operations: Beyond Simple CRUD
Standard INSERT, UPDATE, and DELETE are insufficient. Every data modification becomes a carefully orchestrated set of inserts and updates to maintain the temporal history. We will encapsulate this logic in functions or use CTEs for clarity and atomicity.
Let's assume NOW() refers to transaction_timestamp().
1. Initial Record Creation
When a new employee's salary is set for the first time, effective immediately.
Scenario: On 2023-01-15, we hire an employee (id=42) with a salary of $80,000, effective immediately.
INSERT INTO employee_salaries (employee_id, salary, valid_range, tx_range)
VALUES (
42, -- employee_id
80000.00, -- salary
-- Valid from now until forever (in the real world)
tstzrange('2023-01-15 09:00:00 UTC', 'infinity'),
-- Recorded in the DB from now until forever
tstzrange(NOW(), 'infinity')
);
After this operation, the table contains:
| id | employee_id | salary | valid_range | tx_range |
|---|---|---|---|---|
| 1 | 42 | 80000.00 | ['2023-01-15 09:00Z', infinity) | ['2023-01-15 10:30Z', infinity) |
2. Standard Update (Evolution)
An evolution is a change that happens in the real world and is recorded in the database as it happens. This is the most common type of "update".
Scenario: On 2024-03-01, employee 42 gets a raise to $95,000, effective immediately.
This is a two-step process within a single transaction:
valid_range of the current record is closed at the time of the change.valid_range starting at the time of the change.Here's how to do it atomically using a Common Table Expression (CTE):
BEGIN;
WITH current_record AS (
-- Find the currently valid and active record for the employee
SELECT id
FROM employee_salaries
WHERE employee_id = 42
AND valid_range @> NOW()
AND UPPER(tx_range) = 'infinity'
LIMIT 1
),
terminate_old AS (
-- Update the old record to end its validity period
UPDATE employee_salaries
SET valid_range = tstzrange(LOWER(valid_range), NOW()) -- Close the valid range
WHERE id = (SELECT id FROM current_record)
RETURNING employee_id, salary, valid_range
)
INSERT INTO employee_salaries (employee_id, salary, valid_range, tx_range)
SELECT
42, -- employee_id
95000.00, -- new salary
tstzrange(NOW(), 'infinity'), -- new valid_range
tstzrange(NOW(), 'infinity') -- new tx_range
FROM terminate_old
WHERE (SELECT id FROM current_record) IS NOT NULL;
COMMIT;
After this, the table looks like this:
| id | employee_id | salary | valid_range | tx_range |
|---|---|---|---|---|
| 1 | 42 | 80000.00 | ['2023-01-15 09:00Z', '2024-03-01 11:00Z') | ['2023-01-15 10:30Z', infinity) |
| 2 | 42 | 95000.00 | ['2024-03-01 11:00Z', infinity) | ['2024-03-01 11:00Z', infinity) |
Notice that row 1's tx_range is still open. This is correct. As of today, it is still a true fact that the employee's salary was $80,000 during that period.
3. Back-dated Update (Correction)
This is where bi-temporality shines. A correction fixes a mistake in the historical record.
Scenario: On 2024-04-10, we discover a data entry error. The initial salary for employee 42 on 2023-01-15 was actually $82,500, not $80,000.
We cannot simply update the old row. That would be rewriting history. Instead, we must:
valid_range as the originals, but with a new tx_range starting now.This is a complex operation:
BEGIN;
-- The time of the correction
-- In a real app, this would be a single transaction timestamp
-- We'll use a variable for clarity
DO $$
DECLARE
correction_ts TIMESTAMPTZ := '2024-04-10 14:00:00 UTC';
BEGIN
-- Step 1: Terminate the transaction time of all incorrect records
-- for this employee. This invalidates the entire previous history.
UPDATE employee_salaries
SET tx_range = tstzrange(LOWER(tx_range), correction_ts)
WHERE employee_id = 42
AND UPPER(tx_range) = 'infinity';
-- Step 2: Re-insert the corrected history.
-- We are creating two new rows to replace the two old ones.
-- The first corrected record (the initial salary)
INSERT INTO employee_salaries (employee_id, salary, valid_range, tx_range)
VALUES (
42,
82500.00, -- The corrected salary
tstzrange('2023-01-15 09:00:00 UTC', '2024-03-01 11:00:00 UTC'), -- Original valid_range
tstzrange(correction_ts, 'infinity') -- New tx_range
);
-- The second record (the raise) is unaffected in its substance, but must be
-- re-asserted as part of the new, corrected history.
INSERT INTO employee_salaries (employee_id, salary, valid_range, tx_range)
VALUES (
42,
95000.00,
tstzrange('2024-03-01 11:00:00 UTC', 'infinity'),
tstzrange(correction_ts, 'infinity')
);
END $$;
COMMIT;
Our table now contains the full, auditable history:
| id | employee_id | salary | valid_range | tx_range |
|---|---|---|---|---|
| 1 | 42 | 80000.00 | ['2023-01-15 09:00Z', '2024-03-01 11:00Z') | ['2023-01-15 10:30Z', '2024-04-10 14:00Z') |
| 2 | 42 | 95000.00 | ['2024-03-01 11:00Z', infinity) | ['2024-03-01 11:00Z', '2024-04-10 14:00Z') |
| 3 | 42 | 82500.00 | ['2023-01-15 09:00Z', '2024-03-01 11:00Z') | ['2024-04-10 14:00Z', infinity) |
| 4 | 42 | 95000.00 | ['2024-03-01 11:00Z', infinity) | ['2024-04-10 14:00Z', infinity) |
Rows 1 and 2 are now historical artifacts. Their tx_range is closed. Rows 3 and 4 represent the new, correct version of history as we know it today.
4. Logical Deletion
A "delete" simply means ending the validity of a fact. For example, if an employee leaves the company.
Scenario: On 2025-01-31, employee 42's employment is terminated.
This is identical to a standard evolution update. We close the valid_range of the current active record.
UPDATE employee_salaries
SET valid_range = tstzrange(LOWER(valid_range), '2025-01-31 17:00:00 UTC')
WHERE employee_id = 42
AND UPPER(valid_range) = 'infinity'
AND UPPER(tx_range) = 'infinity';
Advanced Querying: Unlocking Temporal Insights
The complexity of the write operations pays off in the power of the read queries. GIST indexes on the range columns are essential for performance.
Query 1: "As-of" (Current State at a Point in Valid Time)
Question: "What was employee 42's salary on 2023-06-01?"
This is an "as-of" query. We look for the record that was valid at that point in time and is currently believed to be true.
SELECT salary
FROM employee_salaries
WHERE employee_id = 42
-- The valid range contains the target timestamp
AND valid_range @> '2023-06-01 00:00:00 UTC'::timestamptz
-- And this is the currently accepted version of history
AND UPPER(tx_range) = 'infinity';
This query will correctly return $82,500.00 (from row 3), not the original incorrect value.
Query 2: "As-known-on" (Historical Belief at a Point in Transaction Time)
Question: "On 2024-03-15, what did we think employee 42's salary was for the date 2023-06-01?"
This is the quintessential bi-temporal query. We need to travel back in both valid time and transaction time.
SELECT salary
FROM employee_salaries
WHERE employee_id = 42
-- The fact was valid on this date
AND valid_range @> '2023-06-01 00:00:00 UTC'::timestamptz
-- And the record was known to the database on this date
AND tx_range @> '2024-03-15 00:00:00 UTC'::timestamptz;
This query will correctly return $80,000.00 (from row 1), because on 2024-03-15, we had not yet made the correction. The tx_range for row 1 ['2023-01-15 10:30Z', '2024-04-10 14:00Z') contains our query timestamp.
Query 3: Full History of a Value
Question: "Show me the evolution of employee 42's salary over time."
SELECT
salary,
LOWER(valid_range) AS effective_from,
UPPER(valid_range) AS effective_to
FROM employee_salaries
WHERE employee_id = 42
AND UPPER(tx_range) = 'infinity'
ORDER BY effective_from;
This gives you the currently accepted timeline of salary changes.
Performance, Edge Cases, and Production Considerations
GIST Index Performance
A B-Tree index is useless for range overlap (&&) or contains (@>) queries. A GIST (Generalized Search Tree) index is designed for this. It indexes multi-dimensional data and can efficiently answer questions like "find all ranges that overlap with this target range."
Let's look at an EXPLAIN ANALYZE for an "as-of" query on a large table:
-- Hypothetical table with 10 million rows
EXPLAIN ANALYZE SELECT salary
FROM employee_salaries
WHERE employee_id = 8675309
AND valid_range @> '2023-10-25 00:00:00 UTC'::timestamptz
AND UPPER(tx_range) = 'infinity';
Without GIST index on valid_range:
Gather (cost=1000.43..263921.43 rows=1 width=12) (actual time=150.321..354.123 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on employee_salaries (cost=0.43..262921.33 rows=1 width=12) (actual time=348.543..351.432 rows=1 loops=3)
Filter: ((upper(tx_range) = 'infinity'::timestamp with time zone) AND (valid_range @> '2023-10-25 00:00:00+00'::timestamp with time zone) AND (employee_id = 8675309))
Rows Removed by Filter: 3333332
Planning Time: 0.215 ms
Execution Time: 354.189 ms
This results in a slow Parallel Sequential Scan over the entire table.
With GIST index on (employee_id, valid_range):
Index Scan using idx_employee_salaries_emp_valid_gist on employee_salaries (cost=0.43..8.45 rows=1 width=12) (actual time=0.045..0.046 rows=1 loops=1)
Index Cond: ((employee_id = 8675309) AND (valid_range @> '2023-10-25 00:00:00+00'::timestamp with time zone))
Filter: (upper(tx_range) = 'infinity'::timestamp with time zone)
Rows Removed by Filter: 0
Planning Time: 0.189 ms
Execution Time: 0.078 ms
The performance difference is staggering: 354ms vs 0.078ms. The GIST index allows the planner to immediately seek to the relevant records without scanning the table. A multi-column GIST index using btree_gist on (employee_id, valid_range) is even more effective.
Edge Case: Unbounded Ranges and `infinity`
The infinity value in range types is a placeholder for "forever". Care must be taken in application logic to handle it. The functions LOWER(), UPPER(), lower_inf(), and upper_inf() are essential. For example, when displaying a validity period, you might transform an infinite upper bound to NULL or a string like "Present".
Edge Case: Timezone Management
Using TIMESTAMPTZ is non-negotiable. All timestamps entering the system must have timezone information or be explicitly converted to a standard zone (like UTC). The database connection's timezone setting will affect how timestamps are interpreted and displayed, so it's best practice for applications to set it explicitly (e.g., SET TIME ZONE 'UTC'). Without this discipline, you can introduce subtle off-by-one errors due to daylight saving or server location differences.
When Not to Use This Pattern
Bi-temporal modeling introduces significant complexity. It is not a one-size-fits-all solution.
* Overhead: Write operations are much heavier, and storage requirements increase as you never truly delete data. For high-throughput transactional systems where history is not paramount, this is overkill.
* Query Complexity: While powerful, temporal queries are more complex to write and reason about. This can slow down development and increase the risk of bugs if not handled by a well-tested data access layer.
* Data without a clear 'validity': This pattern is best for "facts" that have a defined period of truth in the real world (salaries, addresses, insurance policy terms). It's less useful for event-like data (e.g., application logs) where an event sourcing model might be more appropriate.
Conclusion
By leveraging PostgreSQL's native range types, GIST indexing, and partial exclusion constraints, we can build a powerful, self-consistent bi-temporal data model directly in the database. This pattern elevates data integrity from a simple application-layer concern to a database-enforced guarantee. It provides a complete, auditable history of data, allowing you to answer not just what the state of your system is, but what it was, and what you thought it was at any point in time. While the implementation complexity is non-trivial, for systems where auditability and historical accuracy are core business requirements, this architecture provides a robust and performant foundation.