Bi-temporal Data Models in Postgres with Range Types & Constraints
Beyond SCD Type 2: True Bi-temporal Modeling in PostgreSQL
For senior engineers working on systems in regulated industries like finance, insurance, or healthcare, managing historical data is not a trivial task. The common "Slowly Changing Dimension" (SCD) Type 2 pattern, with its start_date and end_date columns, is often the first tool reached for. However, it suffers from a fundamental flaw: it conflates valid time (when a fact was true in the real world) with transaction time (when the fact was recorded in the database).
This conflation makes it impossible to answer critical questions simultaneously, such as:
- "What was the price of Product A on January 15th?" (A query against valid time)
Answering the second question is crucial for auditing, reproducing reports, and debugging historical discrepancies. When you correct a past mistake using SCD Type 2, you overwrite history, losing the record of the mistake ever existing. This is where a bi-temporal model excels.
This article dives deep into a production-grade implementation of a bi-temporal data model using advanced PostgreSQL features. We will bypass simplistic approaches and focus on a robust, constraint-driven design using tsrange types and EXCLUDE USING GIST constraints. This provides declarative, database-enforced guarantees of temporal integrity, eliminating entire classes of bugs that plague trigger-based or application-level logic.
The Core Schema: Separating Two Timelines
A bi-temporal model explicitly separates the two time dimensions:
* valid_time: A tsrange (timestamp with time zone range) representing the period during which the record's attributes were factually correct in the real world.
* transaction_time: A tsrange representing the period during which this version of the record existed in the database. For the currently active version of a record, the upper bound of this range is typically 'infinity'.
Let's model a common scenario: tracking product prices. A naive table might look like products(id, price). Our bi-temporal version will track price history in a separate table.
First, ensure the btree_gist extension is enabled, as it's required for our exclusion constraint.
CREATE EXTENSION IF NOT EXISTS btree_gist;
Now, let's define the table. We'll use a product_price_id as a surrogate primary key, a product_id to link to the core product entity, the price, our two temporal ranges, and the crucial exclusion constraint.
CREATE TABLE product_prices (
product_price_id BIGSERIAL PRIMARY KEY,
product_id BIGINT NOT NULL,
price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
valid_time TSRANGE NOT NULL,
transaction_time TSRANGE NOT NULL,
-- This is the core of our temporal integrity.
-- It prevents any two rows for the same product from having
-- overlapping valid_time AND overlapping transaction_time.
EXCLUDE USING GIST (
product_id WITH =,
valid_time WITH &&,
transaction_time WITH &&
)
);
-- Create a GiST index to support the constraint and accelerate queries.
-- Note: The EXCLUDE constraint automatically creates this index.
-- We can also create more specific indexes later for performance.
CREATE INDEX idx_product_prices_temporal ON product_prices USING GIST (product_id, valid_time, transaction_time);
The EXCLUDE constraint is the lynchpin of this design. It's a generalization of a UNIQUE constraint. Here, it states that for any two rows in the table, the combination of (product_id = product_id) AND (valid_time && valid_time) AND (transaction_time && transaction_time) cannot all be true. The && is the "overlaps" operator for range types. This single constraint prevents temporal anomalies at the database level, such as defining two different prices for the same product to be valid at the same time and recorded in the database at the same time.
Query Pattern: The 'As-Of' Query
The most frequent query against a bi-temporal model is the "as-of" query, which asks for the state of the world at a specific point in time.
Let's populate some data. Assume now() is 2023-10-26 10:00:00 UTC.
-- Initial price for Product 123, valid starting today.
INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
VALUES (
123,
99.99,
tsrange('2023-10-26 00:00:00', 'infinity', '[)'),
tsrange(now(), 'infinity', '[)')
);
To find the current price of Product 123:
-- What is the price of product 123 right now?
SELECT product_id, price
FROM product_prices
WHERE product_id = 123
AND valid_time @> now() -- The valid_time range contains the current timestamp
AND transaction_time @> now(); -- The transaction_time range contains the current timestamp
The @> operator checks if a range contains a specific element (in this case, a timestamp). This query is efficient because it can use our multi-column GiST index.
Let's look at the EXPLAIN plan:
-- EXPLAIN ANALYZE SELECT ...
Index Scan using idx_product_prices_temporal on product_prices
Index Cond: ((product_id = 123) AND (valid_time @> '2023-10-26 10:00:00+00'::timestamptz) AND (transaction_time @> '2023-10-26 10:00:00+00'::timestamptz))
... (other details)
The plan confirms an efficient index scan is used, which is critical for performance in large datasets.
Mutation Patterns: The True Challenge
Reading the data is straightforward. The real complexity lies in managing mutations while preserving history and satisfying the exclusion constraint. We'll encapsulate this logic in PL/pgSQL functions to ensure atomicity and correctness.
Pattern 1: A Standard Update (Future Change)
Let's say on 2023-11-01, we decide the price for Product 123 will change to 109.99 starting on 2023-12-01.
This involves two steps performed atomically:
99.99 is no longer valid from 2023-12-01 onwards. We must update its valid_time range.109.99 price, with a valid_time starting 2023-12-01.Crucially, this is a change to valid time, not transaction time. We are not correcting a mistake; we are recording a new, future state of the world. The transaction_time of the existing row remains open because it's still the current record in the database of what we believe the price history to be.
Let's wrap this in a function.
CREATE OR REPLACE FUNCTION update_product_price(
p_product_id BIGINT,
p_new_price NUMERIC(10, 2),
p_effective_from TIMESTAMPTZ
)
RETURNS VOID AS $$
DECLARE
current_record RECORD;
BEGIN
-- Use a FOR UPDATE clause to lock the current active record for this product
-- This prevents race conditions if two updates happen concurrently.
SELECT * INTO current_record
FROM product_prices
WHERE product_id = p_product_id
AND upper(valid_time) = 'infinity'
AND upper(transaction_time) = 'infinity'
FOR UPDATE;
-- If no current record exists, or if the new price is the same, do nothing.
IF NOT FOUND OR current_record.price = p_new_price THEN
RETURN;
END IF;
-- Check if the effective date is in the past, which would be a correction,
-- not a standard update. This function only handles future-dated changes.
IF p_effective_from <= lower(current_record.valid_time) THEN
RAISE EXCEPTION 'Effective date must be in the future. Use correction function for past changes.';
END IF;
-- 1. End the validity of the current price record.
-- We are updating the valid_time, not the transaction_time.
UPDATE product_prices
SET valid_time = tsrange(lower(valid_time), p_effective_from, '[)')
WHERE product_price_id = current_record.product_price_id;
-- 2. Insert the new price record.
-- Its validity starts where the old one ended.
-- Its transaction time is the same as the old one, as this is one logical change.
INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
VALUES (p_product_id, p_new_price,
tsrange(p_effective_from, 'infinity', '[)'),
current_record.transaction_time);
END;
$$ LANGUAGE plpgsql;
Usage:
Let's assume now() is 2023-11-01 12:00:00 UTC.
-- Schedule a price change for Dec 1st.
SELECT update_product_price(123, 109.99, '2023-12-01 00:00:00 UTC');
After this, the table for product_id = 123 will contain two rows, both with an open transaction_time:
price=99.99, valid_time=['2023-10-26 00:00:00', '2023-12-01 00:00:00'), transaction_time=['2023-10-26 10:00:00', 'infinity')price=109.99, valid_time=['2023-12-01 00:00:00', 'infinity'), transaction_time=['2023-10-26 10:00:00', 'infinity')The EXCLUDE constraint is satisfied because their valid_time ranges do not overlap.
Pattern 2: A Correction of History (The Bi-temporal Superpower)
This is where the bi-temporal model truly shines. Imagine on 2023-11-15, we discover that the initial price of 99.99 was a data entry error. It should have been 95.99 from the very beginning (2023-10-26).
We cannot simply UPDATE the old record. That would erase the fact that we ever thought the price was 99.99. We need to preserve the audit trail. This is a change to transaction time.
Here's the process, performed atomically at 2023-11-15 14:00:00 UTC:
transaction_time is open) and whose valid_time overlaps with the period we are correcting. Set the upper bound of their transaction_time to now() (2023-11-15 14:00:00). These records are now considered historical from the database's perspective.valid_time as the records they are replacing, but their transaction_time will start from now().This is a complex operation that absolutely must be encapsulated.
CREATE OR REPLACE FUNCTION correct_product_price_history(
p_product_id BIGINT,
p_corrected_price NUMERIC(10, 2),
p_correction_period TSRANGE
)
RETURNS VOID AS $$
DECLARE
rec RECORD;
correction_time TIMESTAMPTZ := now();
BEGIN
-- Find all currently active versions that overlap with the correction period.
-- Lock them to prevent concurrent modifications.
FOR rec IN
SELECT * FROM product_prices
WHERE product_id = p_product_id
AND upper(transaction_time) = 'infinity'
AND valid_time && p_correction_period
FOR UPDATE
LOOP
-- 1. Invalidate the old record by closing its transaction_time.
UPDATE product_prices
SET transaction_time = tsrange(lower(transaction_time), correction_time, '[)')
WHERE product_price_id = rec.product_price_id;
-- We now re-insert the segments of this invalidated record, some corrected,
-- some as they were, with a new transaction_time.
-- 2a. Insert the corrected version for the overlapping part.
-- The new valid_time is the intersection of the old valid_time and the correction period.
INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
VALUES (rec.product_id, p_corrected_price,
rec.valid_time * p_correction_period, -- The '*' is the intersection operator
tsrange(correction_time, 'infinity', '[)'));
-- 2b. If the original period extended before the correction, re-insert that part.
IF lower(rec.valid_time) < lower(p_correction_period) THEN
INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
VALUES (rec.product_id, rec.price,
tsrange(lower(rec.valid_time), lower(p_correction_period), '[)'),
tsrange(correction_time, 'infinity', '[)'));
END IF;
-- 2c. If the original period extended after the correction, re-insert that part.
IF upper(rec.valid_time) > upper(p_correction_period) THEN
INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
VALUES (rec.product_id, rec.price,
tsrange(upper(p_correction_period), upper(rec.valid_time), '[)'),
tsrange(correction_time, 'infinity', '[)'));
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Usage:
On 2023-11-15 14:00:00 UTC, we run the correction.
-- Correct the price to 95.99 for the period starting 2023-10-26.
SELECT correct_product_price_history(
123,
95.99,
tsrange('2023-10-26 00:00:00', 'infinity', '[)')
);
Now, let's see the power of this model.
Query 1: What is the price today (2023-11-15)?
SELECT price
FROM product_prices
WHERE product_id = 123
AND valid_time @> '2023-11-15'::timestamptz
AND transaction_time @> '2023-11-15 15:00:00'::timestamptz; -- After the correction
-- Result: 95.99
Query 2: What did we THINK the price was yesterday (2023-11-14)?
SELECT price
FROM product_prices
WHERE product_id = 123
AND valid_time @> '2023-11-15'::timestamptz -- The price valid on that day
AND transaction_time @> '2023-11-14'::timestamptz; -- As known by the DB yesterday
-- Result: 99.99
This ability to query the database's state at any point in its own history is impossible with simpler models and is the core value proposition of a bi-temporal architecture.
Performance Optimization and Indexing at Scale
The default GiST index created by the EXCLUDE constraint is a good starting point, but for high-read workloads, we can do better.
The Problem with GiST for 'Current State' Queries
GiST indexes are effective for range-based queries ("overlaps", "contains"), but they are generally slower than B-Tree indexes for simple equality checks. A very common query is "get me the current price for all products".
SELECT DISTINCT ON (product_id) product_id, price
FROM product_prices
WHERE valid_time @> now()
AND transaction_time @> now()
ORDER BY product_id, transaction_time DESC;
While the GiST index can be used here, it may not be optimal, especially if the number of historical records per product is large. The query planner has to sift through all historical versions that match the index condition.
Solution: Partial B-Tree Index
We can create a highly efficient, smaller B-Tree index that only contains the currently active records. This is a classic PostgreSQL optimization pattern.
CREATE INDEX idx_product_prices_current_active
ON product_prices (product_id)
WHERE upper(transaction_time) = 'infinity' AND upper(valid_time) = 'infinity';
This index is a game-changer for 'current state' queries. Let's analyze a query that can leverage it:
-- Get the current price for a specific product
EXPLAIN ANALYZE
SELECT price
FROM product_prices
WHERE product_id = 123
AND upper(transaction_time) = 'infinity'
AND upper(valid_time) = 'infinity';
With the partial index, the query plan will switch to a much faster Index Scan on idx_product_prices_current_active, as the WHERE clause of the query is a perfect match for the index's predicate. This index is dramatically smaller than the main table, containing only one entry per product, making lookups extremely fast.
Advanced Edge Cases and Considerations
A production system must handle more than just the basics.
Temporal Joins
What if we have another bi-temporal table, say product_inventory(product_id, quantity, valid_time, transaction_time)? How do we answer: "Show me the price and inventory for all products as of last Tuesday at 5 PM?"
This requires a join that respects both time dimensions.
WITH as_of AS (SELECT '2023-10-24 17:00:00'::timestamptz AS moment)
SELECT
p.product_id,
pr.price,
inv.quantity
FROM products p -- Assuming a main products table
LEFT JOIN product_prices pr
ON p.product_id = pr.product_id
AND pr.valid_time @> (SELECT moment FROM as_of)
AND pr.transaction_time @> (SELECT moment FROM as_of)
LEFT JOIN product_inventory inv
ON p.product_id = inv.product_id
AND inv.valid_time @> (SELECT moment FROM as_of)
AND inv.transaction_time @> (SELECT moment FROM as_of);
This query structure correctly joins the state of each temporal table at the same point in both valid and transaction time. Performance here hinges on having effective GiST indexes on both tables.
Enforcing Contiguity (No Gaps)
The EXCLUDE constraint prevents overlaps, but it doesn't prevent gaps. For example, a product's price could be valid until 2023-12-01, and the next price might not start until 2023-12-05, leaving a 4-day gap where the price is undefined.
Enforcing this business rule requires more than a declarative constraint. The most robust solution is to handle it within your mutation functions (update_product_price, etc.). Before committing a transaction, you can run a check:
-- Inside a PL/pgSQL function, after a mutation
PERFORM 1
FROM product_prices p1
LEFT JOIN product_prices p2 ON p1.product_id = p2.product_id
AND p1.valid_time -|- p2.valid_time -- -|- is the 'adjacent' operator
WHERE p1.product_id = p_product_id
AND upper(p1.transaction_time) = 'infinity'
AND upper(p2.transaction_time) = 'infinity'
AND upper(p1.valid_time) <> 'infinity'
GROUP BY p1.product_price_id
HAVING count(p2.product_price_id) = 0;
IF FOUND THEN
RAISE EXCEPTION 'Gap detected in valid_time for product %', p_product_id;
END IF;
This query looks for any currently active record that has a finite end to its validity but does not have another active record adjacent to it. This check ensures a continuous timeline.
Conclusion: A Robust Foundation for Auditable Systems
Implementing a bi-temporal data model in PostgreSQL is a significant architectural decision. It introduces complexity into your mutation logic and requires a deep understanding of PostgreSQL's more advanced features. However, the payoff is immense for systems where data integrity and a complete, provable audit trail are paramount.
By leveraging tsrange types and EXCLUDE USING GIST constraints, we move temporal logic from fragile application code or complex triggers into a declarative, database-enforced guarantee. This provides an incredibly robust foundation. When combined with carefully crafted PL/pgSQL functions for mutations and optimized indexing strategies like partial indexes, this pattern delivers both correctness and performance at scale. While not a fit for every problem, for the right domain, it is an exceptionally powerful and elegant solution to the complex challenge of managing data over time.