Bi-temporal Data Modeling in PostgreSQL with Range Types & Constraints
The Auditability Mandate: Why Simple Versioning Fails
In many domains—finance, insurance, healthcare, logistics—maintaining a simple updated_at timestamp or a basic history table is insufficient. These systems require not just a record of an entity's state over time, but also a record of when the system knew about those states. This two-dimensional time tracking is known as bi-temporality.
Consider a financial pricing system. A pricing error for a stock on 2023-12-15 is discovered and corrected on 2024-01-10. Your system must be able to answer two distinct but critical questions:
2023-12-15? (The corrected value)2023-12-15 if we had queried it on 2023-12-20? (The original, incorrect value)A standard history table, which typically overwrites or appends versions, can usually answer the first question but fundamentally cannot answer the second. It loses the history of the system's own knowledge. Bi-temporal modeling solves this by tracking two time axes:
* Valid Time (valid_ts): The period in the real world when a fact was true. (e.g., The price was $100 from 2023-12-15 to 2023-12-16)
* Transaction Time (tx_ts): The period during which the database recorded and believed the fact to be true. (e.g., We believed this fact from 2023-12-15 09:30:00 until it was corrected on 2024-01-10 14:00:00)
This article details a robust, constraint-enforced implementation of a bi-temporal data model using advanced PostgreSQL features, moving beyond theoretical discussions to production-ready patterns.
Schema Design: Leveraging Range Types and Exclusion Constraints
The naive approach of using four timestamp columns (valid_from, valid_to, tx_from, tx_to) is prone to complex application logic and race conditions. PostgreSQL provides a more elegant and powerful toolkit: TSRANGE types and EXCLUDE USING GIST constraints.
Our canonical example will be a product_prices table. A product has a specific price for a given period (its valid time).
The Core Table Structure
-- Ensure the btree_gist extension is available for our constraints
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE TABLE product_prices (
-- A surrogate key for this specific version/row. This is the primary key.
version_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- A stable identifier for the logical entity (the product).
-- All versions of a single product's price share the same product_id.
product_id UUID NOT NULL,
-- The payload data.
price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
currency CHAR(3) NOT NULL,
-- == Bi-temporal Axes ==
-- Valid Time: When was this price effective in the real world?
-- Using tsrange for a continuous period. '[)' means inclusive start, exclusive end.
valid_ts TSRANGE NOT NULL,
-- Transaction Time: When did the database believe this fact?
-- This range starts when the record is inserted and extends to 'infinity'.
-- When corrected/superseded, the upper bound is set to the current time.
tx_ts TSTZRANGE NOT NULL DEFAULT TSTZRANGE(NOW(), 'infinity', '[)'),
-- == The Magic: Ensuring Data Integrity at the DB Level ==
-- This constraint is the cornerstone of our bi-temporal model.
-- For any given product_id, you cannot have overlapping valid_ts ranges
-- for records that are *currently active* in transaction time.
CONSTRAINT no_overlapping_valid_time_for_current_records
EXCLUDE USING GIST (
product_id WITH =,
valid_ts WITH &&
) WHERE (tx_ts @> 'infinity'::timestamptz)
);
-- Indexes are critical for performance of bi-temporal queries.
-- A GiST index is needed to support the exclusion constraint and range operators.
CREATE INDEX idx_product_prices_valid_ts_gist ON product_prices USING GIST (product_id, valid_ts);
-- A standard B-tree index on transaction time is also essential for "as-of" queries.
CREATE INDEX idx_product_prices_tx_ts ON product_prices USING GIST (product_id, tx_ts);
Deconstructing the `EXCLUDE` Constraint
This constraint is the most critical piece of the schema. Let's break it down:
EXCLUDE USING GIST (product_id WITH =, valid_ts WITH &&)
* USING GIST: This specifies that a GiST index is required to efficiently check the constraint.
product_id WITH =: The constraint is applied on a per-product basis. Overlapping prices for different* products are perfectly acceptable.
* valid_ts WITH &&: This is the core logic. The && operator for ranges checks for overlaps. The constraint will fail if a new or updated row has a valid_ts that overlaps with an existing row's valid_ts (for the same product_id).
WHERE (tx_ts @> 'infinity'::timestamptz)
This is the advanced pattern that makes the bi-temporal model work. Without it, the constraint would prevent any overlapping valid_ts ranges, even for historical corrections. This partial constraint applies only to rows that are currently considered true by the system—those whose transaction time range extends to infinity.
This elegantly enforces our primary business rule: "At any given point in time, a product can only have one price."
Bi-temporal Operations: Beyond Standard CRUD
Standard INSERT, UPDATE, and DELETE commands break the bi-temporal model because they destroy information. We must instead use a set of carefully crafted procedures that preserve history.
All operations should be wrapped in atomic PL/pgSQL functions to prevent race conditions and ensure data integrity.
1. Creating a New Logical Entity
This is the simplest operation. It's a direct INSERT that establishes the first version of a product's price.
CREATE OR REPLACE FUNCTION create_new_product_price(
p_product_id UUID,
p_price NUMERIC(10, 2),
p_currency CHAR(3),
p_valid_from TIMESTAMPTZ
) RETURNS UUID AS $$
DECLARE
v_version_id UUID;
BEGIN
INSERT INTO product_prices (product_id, price, currency, valid_ts)
VALUES (p_product_id, p_price, p_currency, TSRANGE(p_valid_from, 'infinity', '[)'))
RETURNING version_id INTO v_version_id;
RETURN v_version_id;
END;
$$ LANGUAGE plpgsql;
Usage:
-- Set the initial price for a new product, valid from today onwards.
SELECT create_new_product_price('a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d', 199.99, 'USD', NOW());
2. Updating a Price (Logical Update)
An "update" is not an UPDATE statement in the traditional sense. It involves two steps:
This preserves the old record as a historical fact that was once believed to be true.
CREATE OR REPLACE FUNCTION update_product_price(
p_product_id UUID,
p_new_price NUMERIC(10, 2),
p_effective_from TIMESTAMPTZ
) RETURNS UUID AS $$
DECLARE
v_current_version RECORD;
v_new_version_id UUID;
BEGIN
-- Use a SERIALIZABLE transaction to prevent race conditions where two updates
-- might try to supersede the same record simultaneously.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Find the current active version that we need to supersede.
-- We lock it to ensure no other transaction can modify it.
SELECT * INTO v_current_version
FROM product_prices
WHERE product_id = p_product_id
AND valid_ts @> p_effective_from
AND tx_ts @> 'infinity'::timestamptz
FOR UPDATE;
IF NOT FOUND THEN
RAISE EXCEPTION 'No active price found for product % at effective date %', p_product_id, p_effective_from;
END IF;
-- 1. Close the transaction time of the old record.
-- Its valid time is split at the effective date of the new price.
UPDATE product_prices
SET tx_ts = TSTZRANGE(lower(tx_ts), NOW(), '[)')
WHERE version_id = v_current_version.version_id;
-- 2. If the old price was valid for a period, we need to re-insert the part
-- of its validity that occurred *before* the new price took effect.
IF lower(v_current_version.valid_ts) < p_effective_from THEN
INSERT INTO product_prices(product_id, price, currency, valid_ts, tx_ts)
VALUES (
v_current_version.product_id,
v_current_version.price,
v_current_version.currency,
TSRANGE(lower(v_current_version.valid_ts), p_effective_from, '[)'),
TSTZRANGE(NOW(), 'infinity', '[)')
);
END IF;
-- 3. Insert the new price record, valid from the effective date onwards.
INSERT INTO product_prices(product_id, price, currency, valid_ts, tx_ts)
VALUES (
p_product_id,
p_new_price,
v_current_version.currency, -- Assuming currency doesn't change
TSRANGE(p_effective_from, 'infinity', '[)'),
TSTZRANGE(NOW(), 'infinity', '[)')
)
RETURNING version_id INTO v_new_version_id;
RETURN v_new_version_id;
END;
$$ LANGUAGE plpgsql;
Usage:
-- On 2024-02-15, we decide the price for product 'a1b2...' will change to $219.99 starting 2024-03-01
SELECT update_product_price('a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d', 219.99, '2024-03-01'::timestamptz);
This operation is complex because it splits the existing validity range. The original record (price: 199.99, valid_ts: [2024-01-01, infinity)) is terminated. Two new records are created:
price: 199.99, valid_ts: [2024-01-01, 2024-03-01)price: 219.99, valid_ts: [2024-03-01, infinity)Both new records have a tx_ts starting from 2024-02-15. The database now correctly records that on and after 2024-02-15, we knew about the upcoming price change on 2024-03-01.
3. Deleting a Price (Logical Deletion)
A "delete" simply terminates the transaction time of the current record. The data is never removed.
CREATE OR REPLACE FUNCTION delete_product_price(
p_product_id UUID,
p_effective_from TIMESTAMPTZ
) RETURNS VOID AS $$
BEGIN
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- This logic is similar to the update. We find the active record,
-- terminate its transaction time, and re-insert the portion of its
-- validity before the deletion date.
WITH current_version AS (
SELECT version_id, valid_ts, tx_ts, product_id, price, currency
FROM product_prices
WHERE product_id = p_product_id
AND valid_ts @> p_effective_from
AND tx_ts @> 'infinity'::timestamptz
FOR UPDATE
),
terminate_old AS (
UPDATE product_prices
SET tx_ts = TSTZRANGE(lower(current_version.tx_ts), NOW(), '[)')
FROM current_version
WHERE product_prices.version_id = current_version.version_id
)
-- Re-insert the valid portion before the deletion date
INSERT INTO product_prices(product_id, price, currency, valid_ts, tx_ts)
SELECT
product_id, price, currency,
TSRANGE(lower(valid_ts), p_effective_from, '[)'),
TSTZRANGE(NOW(), 'infinity', '[)')
FROM current_version
WHERE lower(valid_ts) < p_effective_from;
IF NOT FOUND THEN
RAISE EXCEPTION 'No active price found for product % to delete at effective date %', p_product_id, p_effective_from;
END IF;
END;
$$ LANGUAGE plpgsql;
4. Correcting Historical Data (The Power of Bi-temporality)
This is where the model truly shines. Imagine we discover today (2024-02-15) that a price entered on 2023-11-01 was wrong. It should have been $99.99, not $109.99, for the entire month of November 2023.
This operation terminates the incorrect historical record and inserts a new historical record with the correct data. Both operations happen within a transaction that starts now.
CREATE OR REPLACE FUNCTION correct_historical_price(
p_product_id UUID,
p_correct_price NUMERIC(10, 2),
p_valid_range TSRANGE
) RETURNS UUID AS $$
DECLARE
v_incorrect_version RECORD;
v_new_version_id UUID;
BEGIN
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Find the incorrect historical record(s). This query is tricky because we need to
-- find what we *believed* to be true just before this transaction started.
-- For simplicity, we'll find the version(s) that were active and covered this valid range.
-- A more complex implementation might need to handle multiple corrections.
SELECT * INTO v_incorrect_version
FROM product_prices
WHERE product_id = p_product_id
AND valid_ts && p_valid_range -- It overlaps with the period we are correcting
AND tx_ts @> 'infinity'::timestamptz -- It is currently believed to be true
FOR UPDATE;
IF NOT FOUND THEN
RAISE EXCEPTION 'No current record found to correct for product % in range %', p_product_id, p_valid_range;
END IF;
-- 1. Terminate the incorrect record's transaction time.
UPDATE product_prices
SET tx_ts = TSTZRANGE(lower(tx_ts), NOW(), '[)')
WHERE version_id = v_incorrect_version.version_id;
-- Note: This is a simplified correction. A full implementation would need to handle
-- splitting the original range if the correction only applies to a sub-period.
-- Here we assume the correction replaces the entire old record's validity.
-- 2. Insert the new, correct record.
-- Its valid_ts is in the past, but its tx_ts starts now.
INSERT INTO product_prices(product_id, price, currency, valid_ts, tx_ts)
VALUES (
v_incorrect_version.product_id,
p_correct_price,
v_incorrect_version.currency,
p_valid_range,
TSTZRANGE(NOW(), 'infinity', '[)')
)
RETURNING version_id INTO v_new_version_id;
RETURN v_new_version_id;
END;
$$ LANGUAGE plpgsql;
After this operation, a query asking for the price in November 2023 as of today will get the correct $99.99 price. But a query asking for the price as of December 2023 will still get the original, incorrect $109.99 price, providing a perfect audit trail.
Querying Bi-temporal Data
Querying is where the investment in this model pays off. The range containment operator @> is your primary tool.
Let's assume the following state:
valid_ts: [2023-01-01, 2023-06-01), tx_ts: [2023-01-01, 2023-02-15) (Corrected)valid_ts: [2023-01-01, 2023-06-01), tx_ts: [2023-02-15, infinity) (Correction for v1)valid_ts: [2023-06-01, infinity), tx_ts: [2023-05-20, infinity) (Future price update)As-At Query: "What was the price on a specific date?"
This answers from the perspective of the current state of knowledge.
-- What is the price on 2023-03-15?
SELECT product_id, price, currency
FROM product_prices
WHERE product_id = '...'
AND valid_ts @> '2023-03-15'::timestamptz
AND tx_ts @> 'infinity'::timestamptz; -- Only currently active records
-- Result: $105 (v2)
As-Of Query: "What did we think the state of the world was on a specific date?"
This is a historical audit query.
-- What did we believe the price for 2023-03-15 was, back on 2023-02-01?
SELECT product_id, price, currency
FROM product_prices
WHERE product_id = '...'
AND valid_ts @> '2023-03-15'::timestamptz
AND tx_ts @> '2023-02-01'::timestamptz; -- As we knew it on this date
-- Result: $100 (v1), because the correction (v2) hadn't happened yet.
The Combined Bi-temporal Query
This is the most powerful query, answering both questions simultaneously.
-- Show me the price for the product that was valid on 2023-07-01,
-- as we knew it on 2023-06-15.
SELECT product_id, price, currency
FROM product_prices
WHERE product_id = '...'
AND valid_ts @> '2023-07-01'::timestamptz -- The 'as-at' part
AND tx_ts @> '2023-06-15'::timestamptz; -- The 'as-of' part
-- Result: $110 (v3), because the price update was known on 2023-06-15.
Performance, Edge Cases, and Production Considerations
This model is powerful but comes with overhead and complexity that must be managed.
Indexing Strategy
The GiST indexes on (product_id, valid_ts) and (product_id, tx_ts) are not optional; they are fundamental to the performance of range queries and the exclusion constraint. Without them, PostgreSQL would have to perform sequential scans for every query and every write, leading to unacceptable performance on any non-trivial dataset.
Analyze your query patterns with EXPLAIN ANALYZE. If you frequently query only on valid_ts across all products, a standalone GiST index on valid_ts might be beneficial.
Write Amplification and Table Bloat
Every logical UPDATE or DELETE results in at least one UPDATE and one INSERT. This write amplification means your table will grow much faster than a traditional table. PostgreSQL's MVCC system will create dead tuples for every UPDATE, leading to table bloat.
* Aggressive Autovacuum: Tune autovacuum parameters for your bi-temporal tables to be more aggressive. Lower autovacuum_scale_factor and autovacuum_analyze_scale_factor significantly (e.g., to 0.01 or lower) for these specific tables.
* Partitioning: For very large datasets (billions of rows), partitioning is essential. A common strategy is to partition by product_id (LIST partitioning) or by a component of the transaction time, such as the year tx_ts was created (RANGE partitioning). This keeps indexes smaller and allows for more efficient maintenance.
Transaction Isolation and Concurrency
As shown in the PL/pgSQL functions, SERIALIZABLE isolation is the safest level for complex bi-temporal operations. A REPEATABLE READ transaction could suffer from phantom reads where one transaction starts an update, and a second concurrent transaction doesn't see the first one's target row and tries to update it as well, leading to integrity violations when both try to commit. SERIALIZABLE prevents this but can increase transaction aborts under high contention. You must build retry logic into your application when using serializable transactions.
Clock Skew and Timestamps
Using NOW() in a distributed application environment can be risky if server clocks are not perfectly synchronized. A better practice within PL/pgSQL functions is to use transaction_timestamp(). This function returns a single, stable timestamp for the duration of the entire transaction, preventing inconsistencies where different operations within the same function get slightly different timestamps.
Handling Contiguous Validity
The EXCLUDE constraint prevents overlaps, but it does not prevent gaps in valid_ts. If your business logic requires that a product always has a price (i.e., the valid_ts ranges must be perfectly contiguous), you will need to enforce this with more complex logic, likely in a CONSTRAINT TRIGGER. This trigger would fire at the end of a transaction and check that for a given product_id, the union of all its currently-active valid_ts ranges forms a continuous timeline.
Conclusion
Implementing a bi-temporal data model is a significant engineering investment. It introduces complexity in your write paths and requires a deeper understanding of your database's capabilities. However, for systems where auditability and historical accuracy are non-negotiable, it is an unparalleled pattern. By leveraging PostgreSQL's native TSRANGE types, GiST indexes, and partial exclusion constraints, you can build a robust, performant, and database-enforced bi-temporal system that pushes complex consistency logic out of the application layer and into the database, where it belongs.