Bi-temporal Data Modeling in Postgres with Range Types & GIST
The Bi-temporal Challenge: Beyond Simple Versioning
In systems requiring strict auditing and historical accuracy, tracking data changes is non-negotiable. Most developers are familiar with slowly changing dimensions (SCDs) or simple version tables using start_date and end_date columns. These solve a single temporal problem: Valid Time. Valid time answers the question, "When was this fact true in the real world?"
However, a more complex and critical requirement often emerges in regulated industries: Transaction Time. Transaction time answers the question, "When did the database know about this fact?" The combination of these two time axes is called bi-temporality.
Consider this scenario for an insurance policy:
policy_id = 123 is created with a premium of $100. The effective date (valid time) is also June 1, 2023.$105 from the very beginning.Now, you need to answer two distinct questions:
$105.$100.Naive approaches using a single valid_from/valid_to pair with an updated_at timestamp fail here. Overwriting the original record on August 15 destroys the historical record of the mistake. A simple audit log can show that a change occurred, but reconstructing the state of the entire database at a specific transaction time is computationally expensive and complex.
This is where a true bi-temporal model shines. We will build a robust solution using native PostgreSQL features that are tailor-made for this problem: range types and GIST indexes.
A Production-Grade Schema using Range Types
PostgreSQL's range types allow us to treat a start and end point as a single, atomic data type. This is far superior to managing two separate columns, as it enables powerful operators and indexing strategies. For our bi-temporal model, we'll use tstzrange (timestamp with time zone range).
Our schema will track policy premiums. The key insight is to have two range columns:
valid_range: Represents the Valid Time.tx_range: Represents the Transaction Time.Here is the core DDL:
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE TABLE policy_premiums (
-- A surrogate key for the version row itself
id BIGSERIAL PRIMARY KEY,
-- A stable identifier for the logical policy
policy_id UUID NOT NULL,
-- The actual data we are versioning
premium_cents INTEGER NOT NULL CHECK (premium_cents > 0),
-- Bi-temporal axes
valid_range TSTZRANGE NOT NULL,
tx_range TSTZRANGE NOT NULL,
-- This is the magic. For a given policy, the valid time ranges
-- cannot overlap if they are currently active in transaction time.
CONSTRAINT no_overlapping_valid_ranges
EXCLUDE USING gist (policy_id WITH =, valid_range WITH &&)
WHERE (upper(tx_range) IS NULL)
);
-- Create indexes to support our queries
CREATE INDEX idx_policy_premiums_policy_id_tx_range ON policy_premiums USING gist (policy_id, tx_range);
CREATE INDEX idx_policy_premiums_valid_range ON policy_premiums USING gist (valid_range);
Deconstructing the Schema
btree_gist Extension: This is required to allow us to use a scalar type like policy_id within a GIST-based exclusion constraint.policy_id: This is the business key. It remains constant across all versions of a single policy.valid_range: The TSTZRANGE representing the period when the premium_cents value was effective in the real world.tx_range: The TSTZRANGE representing the period when this row was considered the correct version by the database.EXCLUDE Constraint: This is the most critical piece of the schema. Let's break it down: - EXCLUDE USING gist: Specifies that this is a GiST-backed exclusion constraint, necessary for range types.
- (policy_id WITH =, valid_range WITH &&): This defines the check. For any two rows being compared:
- If the policy_id is the same (=), AND
- their valid_ranges overlap (&& is the overlaps operator),
- then the constraint is violated.
- WHERE (upper(tx_range) IS NULL): This is the genius of the pattern. The constraint only applies to currently active records. A record is considered active if its transaction time range is open-ended (upper(tx_range) is infinity, represented as NULL in the upper bound). When we correct or update a record, we will "close" its tx_range, making it a historical record, which then exempts it from this constraint. This allows us to have multiple historical versions that may have overlapping valid times, but only one currently active version for any point in valid time.
Implementing Core Bi-temporal Operations
Managing data in this structure requires careful, atomic operations. We'll encapsulate the logic in PL/pgSQL functions to ensure correctness and transactional integrity.
Initial Record Creation
When a new policy is created, it's active from that moment into the future, for both valid and transaction time.
CREATE OR REPLACE FUNCTION create_policy(p_policy_id UUID, p_premium_cents INTEGER, p_valid_from TIMESTAMPTZ)
RETURNS VOID AS $$
BEGIN
INSERT INTO policy_premiums (policy_id, premium_cents, valid_range, tx_range)
VALUES (
p_policy_id,
p_premium_cents,
-- Valid from the specified date to forever
tstzrange(p_valid_from, 'infinity', '[)'),
-- Known by the DB from now to forever
tstzrange(NOW(), 'infinity', '[)')
);
END;
$$ LANGUAGE plpgsql;
Usage:
-- Create a policy, valid starting June 1, 2023
SELECT create_policy('a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d', 10000, '2023-06-01 00:00:00 UTC');
After running this, the table contains:
| policy_id | premium_cents | valid_range | tx_range |
|---|---|---|---|
| a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d | 10000 | ["2023-06-01 00:00:00+00",infinity) | ["2023-09-10 14:30:00+00",infinity) |
Operation 1: The Business Evolution (A standard update)
This is when a policy changes as part of normal business operations. For example, the premium increases for the next billing cycle.
Logic:
- Find the currently active record for the policy.
valid_range at the time the new change becomes effective.valid_range starts where the old one left off.CREATE OR REPLACE FUNCTION evolve_policy_premium(p_policy_id UUID, p_new_premium_cents INTEGER, p_effective_from TIMESTAMPTZ)
RETURNS VOID AS $$
DECLARE
v_current_tx_time TIMESTAMPTZ := NOW();
BEGIN
-- Step 1 & 2: End the validity of the current active record
UPDATE policy_premiums
SET valid_range = tstzrange(lower(valid_range), p_effective_from, '[)')
WHERE policy_id = p_policy_id
AND upper(tx_range) IS NULL
AND lower(valid_range) < p_effective_from;
IF NOT FOUND THEN
RAISE EXCEPTION 'No active policy found for ID % to evolve.', p_policy_id;
END IF;
-- Step 3: Insert the new version
INSERT INTO policy_premiums (policy_id, premium_cents, valid_range, tx_range)
VALUES (
p_policy_id,
p_new_premium_cents,
tstzrange(p_effective_from, 'infinity', '[)'),
tstzrange(v_current_tx_time, 'infinity', '[)')
);
END;
$$ LANGUAGE plpgsql;
Usage:
Let's say on September 10, 2023, we decide the premium will increase to $120 starting October 1, 2023.
SELECT evolve_policy_premium('a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d', 12000, '2023-10-01 00:00:00 UTC');
Now the table looks like this:
| id | policy_id | premium_cents | valid_range | tx_range |
|---|---|---|---|---|
| 1 | a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d | 10000 | ["2023-06-01 00:00:00+00","2023-10-01 00:00:00+00") | ["2023-09-10 14:30:00+00",infinity) |
| 2 | a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d | 12000 | ["2023-10-01 00:00:00+00",infinity) | ["2023-09-10 15:00:00+00",infinity) |
Notice that both records are active in transaction time (tx_range ends at infinity). Our EXCLUDE constraint is satisfied because their valid_ranges do not overlap.
Operation 2: The Historical Correction
This is the most complex and powerful operation. We need to go back and fix a mistake, while preserving the fact that the mistake existed in our records for a period of time. This is our initial scenario: correcting the $100 premium to $105 back to June 1.
Logic:
- Identify all active records that need to be corrected.
tx_range) at the current time. This marks them as historical, preserving the incorrect record.valid_range as the records they are replacing, but their tx_range will start from the current time.CREATE OR REPLACE FUNCTION correct_policy_data(
p_policy_id UUID,
p_new_premium_cents INTEGER,
p_valid_from TIMESTAMPTZ,
p_valid_to TIMESTAMPTZ -- Can be NULL for 'infinity'
)
RETURNS VOID AS $$
DECLARE
v_correction_time TIMESTAMPTZ := NOW();
v_target_valid_range TSTZRANGE := tstzrange(p_valid_from, p_valid_to, '[)');
rec RECORD;
BEGIN
-- We need to find all active versions that overlap with the correction's validity period.
-- A FOR loop is used because a correction might span multiple existing versions.
FOR rec IN
SELECT * FROM policy_premiums
WHERE policy_id = p_policy_id
AND upper(tx_range) IS NULL
AND valid_range && v_target_valid_range
ORDER BY lower(valid_range)
LOOP
-- Step 1 & 2: End the transaction time for the incorrect record(s)
UPDATE policy_premiums
SET tx_range = tstzrange(lower(tx_range), v_correction_time, '[)')
WHERE id = rec.id;
-- Now, re-insert the historical segments that were NOT part of the correction.
-- This handles cases where the correction is only for a sub-period of an existing record.
-- 1. Handle the part *before* the correction
IF lower(rec.valid_range) < lower(v_target_valid_range) THEN
INSERT INTO policy_premiums (policy_id, premium_cents, valid_range, tx_range)
VALUES (
rec.policy_id, rec.premium_cents,
tstzrange(lower(rec.valid_range), lower(v_target_valid_range), '[)'),
tstzrange(v_correction_time, 'infinity', '[)')
);
END IF;
-- 2. Handle the part *after* the correction
IF upper(rec.valid_range) > upper(v_target_valid_range) THEN
INSERT INTO policy_premiums (policy_id, premium_cents, valid_range, tx_range)
VALUES (
rec.policy_id, rec.premium_cents,
tstzrange(upper(v_target_valid_range), upper(rec.valid_range), '[)'),
tstzrange(v_correction_time, 'infinity', '[)')
);
END IF;
END LOOP;
-- Step 3: Insert the new, corrected record for the target period
INSERT INTO policy_premiums (policy_id, premium_cents, valid_range, tx_range)
VALUES (
p_policy_id,
p_new_premium_cents,
v_target_valid_range,
tstzrange(v_correction_time, 'infinity', '[)')
);
END;
$$ LANGUAGE plpgsql;
Usage:
On September 11, 2023, we realize the original premium from June 1, 2023 should have been $105.
-- Correct the premium from June 1 up to Oct 1
SELECT correct_policy_data('a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d', 10500, '2023-06-01 00:00:00 UTC', '2023-10-01 00:00:00 UTC');
Resulting Table State:
| id | policy_id | premium_cents | valid_range | tx_range |
|---|
| 1 | a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d | 10000 | ["2023-06-01...","2023-10-01...") | ["2023-09-10 14:30:00+00","2023-09-11 10:00:00+00") | <-- Historical (mistake)
| 2 | a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d | 12000 | ["2023-10-01...",infinity) | ["2023-09-10 15:00:00+00","2023-09-11 10:00:00+00") | <-- Historical (superseded)
| 3 | a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d | 10500 | ["2023-06-01...","2023-10-01...") | ["2023-09-11 10:00:00+00",infinity) | <-- Active (correction)
| 4 | a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d | 12000 | ["2023-10-01...",infinity) | ["2023-09-11 10:00:00+00",infinity) | <-- Active (future state)
Rows 1 and 2 now have a closed tx_range. They are immutable historical artifacts. Rows 3 and 4 represent the current truth. The EXCLUDE constraint is satisfied because it only applies to rows where upper(tx_range) is NULL (infinity).
Advanced Bi-temporal Query Patterns
With our data correctly structured, we can now perform powerful time-travel queries.
Query 1: "As-Of" (Valid Time) - The Current Truth
"What is the correct premium for policy 123 on July 1, 2023?"
This query looks for the currently active records (tx_range) and then finds the one whose valid_range contains the target date.
SELECT premium_cents
FROM policy_premiums
WHERE policy_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d'
-- Find the currently active version in the DB
AND tx_range @> NOW()
-- Find the version that was valid on the target date
AND valid_range @> '2023-07-01 00:00:00 UTC'::TIMESTAMPTZ;
Result: 10500 (The corrected value)
Query 2: "As-Was" (Transaction Time) - The Historical Truth
"On September 10, 2023 at 4 PM, what did we think the premium was for July 1, 2023?"
This is the time-travel query. We fix both the transaction time and the valid time.
SELECT premium_cents
FROM policy_premiums
WHERE policy_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d'
-- Find the version the DB knew about at a specific past time
AND tx_range @> '2023-09-10 16:00:00 UTC'::TIMESTAMPTZ
-- Find the version that was valid on the target date
AND valid_range @> '2023-07-01 00:00:00 UTC'::TIMESTAMPTZ;
Result: 10000 (The original, mistaken value)
Query 3: Getting the full valid-time history for a policy *as it is known now*
This is useful for displaying the timeline of a policy to a user.
SELECT
lower(valid_range) AS valid_from,
upper(valid_range) AS valid_to,
premium_cents
FROM policy_premiums
WHERE policy_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d'
AND tx_range @> NOW()
ORDER BY lower(valid_range);
Result:
| valid_from | valid_to | premium_cents |
|---|---|---|
| 2023-06-01 00:00:00+00 | 2023-10-01 00:00:00+00 | 10500 |
| 2023-10-01 00:00:00+00 | NULL | 12000 |
Performance, Indexing, and Edge Cases
This model is powerful, but its performance hinges entirely on correct indexing.
The Role of GIST Indexes
Standard B-tree indexes are ineffective for range-type operators like && (overlaps) or @> (contains). GIST (Generalized Search Tree) indexes are designed for this. They can index multi-dimensional data and handle complex data types.
Our schema included these two crucial indexes:
idx_policy_premiums_policy_id_tx_range: This is a multi-column GIST index. It's the workhorse for most queries, allowing the planner to first filter by policy_id (a B-tree-like operation within the GIST index, thanks to btree_gist) and then efficiently search the tx_range for containment or overlaps.EXCLUDE constraint: The constraint EXCLUDE USING gist (policy_id WITH =, valid_range WITH &&) WHERE (upper(tx_range) IS NULL) implicitly creates a partial GIST index on (policy_id, valid_range) for active rows. This index not only enforces the constraint but also accelerates queries that filter on valid_range for active records.Let's analyze a query plan:
EXPLAIN ANALYZE
SELECT premium_cents
FROM policy_premiums
WHERE policy_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d'
AND tx_range @> '2023-09-10 16:00:00 UTC'::TIMESTAMPTZ
AND valid_range @> '2023-07-01 00:00:00 UTC'::TIMESTAMPTZ;
Query Plan (with GIST indexes):
Index Scan using idx_policy_premiums_policy_id_tx_range on policy_premiums ...
Index Cond: ((policy_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d') AND (tx_range @> '2023-09-10 16:00:00+00'::timestamptz))
Filter: (valid_range @> '2023-07-01 00:00:00+00'::timestamptz)
Rows Removed by Filter: 0
... (Execution time: 0.050 ms)
PostgreSQL uses the GIST index on (policy_id, tx_range) to very quickly find the small number of candidate rows. The valid_range check is then a cheap filter on the result. Without this index, the database would have to perform a full table scan, which would be disastrously slow on a large table.
Edge Case: Timezone Management
Our use of TIMESTAMPTZ and TSTZRANGE is deliberate and critical. All timestamps are stored in UTC. When a client connects with a specific timezone setting, PostgreSQL correctly converts the timestamps for display. All range calculations and comparisons happen in UTC, eliminating an entire class of bugs. Never use TIMESTAMP without timezone for this pattern.
Edge Case: Contiguity and Gaps
The EXCLUDE constraint prevents overlaps in valid time for active records, but it does not prevent gaps. For example, you could end one record's validity on Oct 1 and start the next on Oct 3, leaving a two-day gap where the policy has no defined premium. If this is not allowed by business logic, it must be enforced:
BEFORE INSERT OR UPDATE trigger that checks for gaps between adjacent valid-time records for the same policy. This adds complexity but provides stronger guarantees.Edge Case: The Correction of a Correction
What happens if the $105 correction was also a mistake, and it should have been $102? The model handles this perfectly. You would simply run the correct_policy_data function again. This would:
$105 premium (row 3 in our table).tx_range at the new correction time.$102 premium, a valid_range of [2023-06-01, 2023-10-01), and a tx_range starting from the new correction time.We would now have three generations of belief for that time period, all queryable and perfectly preserved.
Conclusion & Production Considerations
This bi-temporal modeling pattern in PostgreSQL is a powerful, dependency-free solution for systems that demand an unimpeachable audit trail. By leveraging range types, GIST indexes, and exclusion constraints, we build a system where history is never deleted, only superseded, and where we can query the state of the world from any point in valid time, as known by the system at any point in transaction time.
Trade-offs to Consider:
DELETE or truly UPDATE data. Every change creates at least one new row and modifies one existing one. For high-churn data, this can be significant.INSERT/UPDATE statements. All modifications must be routed through carefully constructed functions.SELECT ... WHERE id = ?.Despite these trade-offs, when the business requirement is true bi-temporality—common in finance, insurance, healthcare, and any domain with strong regulatory or compliance needs—this native PostgreSQL pattern offers a robust, performant, and elegant solution that avoids the overhead of external versioning systems or dedicated temporal databases.