Bi-Temporal Modeling in Postgres with Range Types & Exclusion Constraints
The Bi-Temporal Challenge: Beyond Standard Versioning
As senior engineers, we're well-versed in versioning data. The most common pattern is System-Versioned Tables, often implemented as a Slowly Changing Dimension (SCD) Type 2. This uni-temporal model answers the question: "What did this record look like at time T?" It tracks the history of a record's state over its valid time—the time it was true in the real world.
However, a more complex and critical requirement exists in domains like finance, insurance, and legal systems: bi-temporality. A bi-temporal model must answer a fundamentally harder question: "What did we think this record looked like at time T?"
This introduces a second time dimension: transaction time. This is the period during which a given fact was recorded and considered current within the database itself.
valid_range): The time range when a fact is true in the real world. This is business-driven and can be in the past, present, or future. Example: A price change is effective from January 1st to June 30th.tx_range): The time range when a fact is stored in the database. The start is when the record is inserted, and the end is when it is superseded by a new version. This is system-driven and always moves forward. Example: We recorded this price change on December 15th.Managing this bi-temporal state in application logic is a recipe for disaster. It's prone to race conditions, complex validation rules, and subtle bugs that corrupt your historical data. The only robust solution is to enforce these complex temporal constraints directly within the database. PostgreSQL, with its powerful range types and exclusion constraints, is uniquely equipped for this task.
This article will demonstrate how to build a production-grade, constraint-enforced bi-temporal model that makes data corruption virtually impossible.
The Schema: From Brittle Timestamps to Resilient Ranges
A naive approach might use four timestamp columns: valid_from, valid_to, tx_from, tx_to. This immediately presents problems:
CHECK constraints or triggers that are difficult to write and maintain.BETWEEN clauses, handling NULL for open-ended times, and are notoriously difficult to get right and index effectively.The Superior Approach: `tsrange` and `EXCLUDE USING GIST`
We will leverage two advanced PostgreSQL features:
tsrange: A native data type representing a range of timestamps. It has a lower and upper bound and understands concepts like inclusion/exclusion of boundaries and infinity.EXCLUDE): A generalization of UNIQUE constraints. They ensure that if any two rows are compared on the specified columns or expressions using the specified operators, at least one of these operator comparisons will return false or null. This is precisely what we need to prevent overlapping time ranges.Let's design our schema for a products table where we need to track price history bi-temporally.
First, we need the btree_gist extension, which allows us to use B-Tree-like operators (like equality =) within a GiST index, which is required for range operators.
-- Enable the extension required for the exclusion constraint
CREATE EXTENSION IF NOT EXISTS btree_gist;
Now, the table definition. This is the core of our model.
CREATE TABLE products (
-- A surrogate primary key for the row itself
id BIGSERIAL PRIMARY KEY,
-- A stable business identifier for the product entity
product_uuid UUID NOT NULL,
-- Bi-temporal validity periods
valid_range TSTZRANGE NOT NULL,
tx_range TSTZRANGE NOT NULL,
-- The actual data we are versioning
price NUMERIC(10, 2) NOT NULL,
currency CHAR(3) NOT NULL,
-- The magic: A multi-dimensional constraint preventing overlaps.
-- This ensures that for any given product_uuid, no two rows can have
-- an overlapping valid_range AND an overlapping tx_range.
EXCLUDE USING GIST (
product_uuid WITH =,
valid_range WITH &&,
tx_range WITH &&
)
);
-- Create a multi-column index for efficient querying
CREATE INDEX products_temporal_idx ON products USING GIST (product_uuid, valid_range, tx_range);
Let's break down the EXCLUDE constraint:
product_uuid WITH =: For any two rows being compared, if their product_uuid is the same... valid_range WITH &&: ...and their valid_range overlaps (the && operator)... tx_range WITH &&: ...and their tx_range also overlaps...- ...then the constraint is violated, and the operation is rejected.
This single constraint provides the atomicity and consistency that would require pages of complex application logic. It is the bedrock of our bi-temporal model.
Implementing Write Operations: The Transactional Dance
In a bi-temporal model, you never truly UPDATE or DELETE data. Every change is an INSERT of a new reality, coupled with the closing of a previous reality's transaction timeline. This requires careful transactional logic, best encapsulated in stored procedures.
Let's define our conventions:
infinity (NULL in tsrange representation).statement_timestamp() to ensure a consistent timestamp throughout a transaction.1. Initial Product Creation
This is the simplest operation. A new product is created with a price that is valid from now until forever, and this fact is recorded in the database from now until forever.
CREATE OR REPLACE FUNCTION create_product(
p_product_uuid UUID,
p_price NUMERIC(10, 2),
p_currency CHAR(3)
) RETURNS VOID AS $$
DECLARE
current_ts TIMESTAMPTZ := statement_timestamp();
BEGIN
INSERT INTO products (product_uuid, valid_range, tx_range, price, currency)
VALUES (
p_product_uuid,
-- Valid in the real world from now until forever
tstzrange(current_ts, 'infinity', '[)'),
-- Recorded in the DB from now until forever
tstzrange(current_ts, 'infinity', '[)'),
p_price,
p_currency
);
END;
$$ LANGUAGE plpgsql;
2. The Bi-Temporal "Update": A Correction
This is the most critical and complex operation. Imagine we quoted a price of $99.99, but we realize an hour later it should have been $95.99. This is a correction. The original fact was never true in the real world; we just recorded it incorrectly.
Here's the process:
tx_range).tx_range to now(). This preserves it as a historical record of what we used to think was true.INSERT a new record with the corrected data. This new record inherits the same valid_range as the record it's correcting, but its tx_range starts from now().CREATE OR REPLACE FUNCTION correct_product_price(
p_product_uuid UUID,
p_new_price NUMERIC(10, 2)
) RETURNS VOID AS $$
DECLARE
current_ts TIMESTAMPTZ := statement_timestamp();
current_record products%ROWTYPE;
BEGIN
-- Step 1: Find the current active record and lock it
SELECT * INTO current_record
FROM products
WHERE product_uuid = p_product_uuid
AND upper(tx_range) = 'infinity'
FOR UPDATE;
IF NOT FOUND THEN
RAISE EXCEPTION 'Product with UUID % not found or has no active record.', p_product_uuid;
END IF;
-- Step 2: End the transaction time of the old record
UPDATE products
SET tx_range = tstzrange(lower(tx_range), current_ts, '[)')
WHERE id = current_record.id;
-- Step 3: Insert the new, corrected record.
-- Note: valid_range is carried over from the original record.
INSERT INTO products (product_uuid, valid_range, tx_range, price, currency)
VALUES (
p_product_uuid,
current_record.valid_range, -- Inherit the original valid_range
tstzrange(current_ts, 'infinity', '[)'),
p_new_price,
current_record.currency
);
END;
$$ LANGUAGE plpgsql;
3. The Bi-Temporal "Update": A Real-World Change
Now consider a scheduled price change. The price is $95.99 today, but starting next month, it will be $105.00. This is a change in the valid time, not a correction.
- Find the current active record.
valid_range at now(). This requires two steps in our model: a. End the tx_range of the current record.
b. INSERT a new version of the old record with its valid_range now closed.
INSERT a completely new record for the future price, with its valid_range starting from the effective date.CREATE OR REPLACE FUNCTION schedule_price_change(
p_product_uuid UUID,
p_new_price NUMERIC(10, 2),
p_effective_date TIMESTAMPTZ
) RETURNS VOID AS $$
DECLARE
current_ts TIMESTAMPTZ := statement_timestamp();
current_record products%ROWTYPE;
BEGIN
IF p_effective_date <= current_ts THEN
RAISE EXCEPTION 'Effective date must be in the future.';
END IF;
-- Step 1: Find and lock the current active record
SELECT * INTO current_record
FROM products
WHERE product_uuid = p_product_uuid
AND upper(tx_range) = 'infinity'
AND p_effective_date <@ valid_range -- Ensure the change occurs within the current validity
FOR UPDATE;
IF NOT FOUND THEN
RAISE EXCEPTION 'Product with UUID % not found or has no valid record for the effective date.', p_product_uuid;
END IF;
-- Step 2a: End the transaction time of the current record
UPDATE products
SET tx_range = tstzrange(lower(tx_range), current_ts, '[)')
WHERE id = current_record.id;
-- Step 2b: Insert the 'terminated' version of the old record
-- Its validity now ends at the new price's effective date.
INSERT INTO products (product_uuid, valid_range, tx_range, price, currency)
VALUES (
p_product_uuid,
tstzrange(lower(current_record.valid_range), p_effective_date, '[)'),
tstzrange(current_ts, 'infinity', '[)'),
current_record.price,
current_record.currency
);
-- Step 3: Insert the new record for the future price
INSERT INTO products (product_uuid, valid_range, tx_range, price, currency)
VALUES (
p_product_uuid,
tstzrange(p_effective_date, 'infinity', '[)'),
tstzrange(current_ts, 'infinity', '[)'),
p_new_price,
current_record.currency
);
END;
$$ LANGUAGE plpgsql;
This transactional logic is complex, but by encapsulating it within the database, we guarantee that no matter how the application calls it, the bi-temporal invariants are always maintained thanks to our EXCLUDE constraint.
Querying Bi-Temporal Data: Unlocking History
The real power of this model is revealed when we query it. Our GiST index is crucial for making these queries performant.
Let's populate some data to demonstrate.
-- Let's use a fixed UUID for our example product
DO $$
DECLARE
product_id UUID := 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11';
BEGIN
-- 2023-01-01: Product created with price $100
PERFORM create_product(product_id, 100.00, 'USD');
-- 2023-02-01: We realize the price should have been $95 all along (Correction)
-- We need to manually set statement_timestamp for reproducible example
SET LOCAL statement_timeout = 0;
ALTER ROLE postgres SET statement_timestamp = '2023-02-01 10:00:00 UTC';
PERFORM correct_product_price(product_id, 95.00);
RESET statement_timestamp;
-- 2023-03-01: We schedule a price increase to $110, effective 2023-04-01
SET LOCAL statement_timeout = 0;
ALTER ROLE postgres SET statement_timestamp = '2023-03-01 12:00:00 UTC';
PERFORM schedule_price_change(product_id, 110.00, '2023-04-01 00:00:00 UTC');
RESET statement_timestamp;
END $$;
After these operations, our products table will contain several rows representing the full history. Now, let's ask some questions.
Query 1: What is the current price? ("As-of-Now")
This is the simplest query. We want the record that is currently valid in the real world and is the latest version in our database.
SELECT product_uuid, price, currency
FROM products
WHERE now() <@ valid_range
AND upper(tx_range) = 'infinity';
Query 2: What was the price on 2023-01-15? ("As-of" Query)
This is a standard uni-temporal query. We want the record that was valid on a specific date, according to our current knowledge.
SELECT product_uuid, price, currency
FROM products
WHERE '2023-01-15'::timestamptz <@ valid_range
AND upper(tx_range) = 'infinity';
-- Result: $95.00. We are querying with today's knowledge, which includes the correction.
Query 3: What did we THINK the price was on 2023-01-15, if we had asked on 2023-01-20? ("As-was-known" Query)
This is the quintessential bi-temporal query. We are time-traveling back to a point in the database's history and asking a question about a point in real-world time.
SELECT product_uuid, price, currency
FROM products
WHERE '2023-01-20'::timestamptz <@ tx_range -- The state of the DB on this date
AND '2023-01-15'::timestamptz <@ valid_range; -- The real-world validity on this date
-- Result: $100.00. On Jan 20th, we had not yet made the correction, so we still thought the price was $100.
Query 4: What did we THINK the price was on 2023-01-15, if we had asked on 2023-02-15?
Same question, but now we ask after the correction was made.
SELECT product_uuid, price, currency
FROM products
WHERE '2023-02-15'::timestamptz <@ tx_range -- The state of the DB on this date
AND '2023-01-15'::timestamptz <@ valid_range; -- The real-world validity on this date
-- Result: $95.00. By Feb 15th, the correction was in the database.
These queries demonstrate the power of the model. The GiST index on (product_uuid, valid_range, tx_range) is critical for ensuring these lookups are efficient, as it can search across all three dimensions simultaneously.
Performance and Edge Case Considerations
This model is powerful but comes with trade-offs and requires careful management in production.
Performance Implications
UPDATE becomes at least one UPDATE and one INSERT. A change to the valid_range is an UPDATE and two INSERTs. This increases write load and can lead to table bloat faster than a traditional model. Aggressive autovacuum tuning is essential for these tables.<@ operator), complex analytical queries that scan large portions of history may require different strategies, such as materialized views or ETL processes into an analytical store.Advanced Edge Cases
NULL (if nullable). This often requires a carefully scripted data migration, not a simple ALTER TABLE.statement_timestamp(). In a distributed system with multiple application servers, clock skew can lead to transaction time inconsistencies. For high-stakes financial systems, it's often recommended to have a single, authoritative time source or to pass the transaction timestamp from a central service rather than relying on the database server's local clock in a multi-node cluster.TIMESTAMPTZ. Storing timestamps without timezone information in a temporal model will lead to incorrect query results and data corruption when data is written or read from different timezones.Conclusion: A Powerful Pattern for Critical Systems
Implementing a bi-temporal data model is not a trivial undertaking. It introduces complexity in your write path and requires a shift in thinking about how data is queried. However, by leveraging PostgreSQL's native range types and exclusion constraints, you can offload the most dangerous and error-prone part of this model—the enforcement of temporal invariants—directly to the database.
This creates a system that is not just auditable but provably correct. Your application code is simplified because it no longer needs to worry about preventing temporal overlaps; it can simply perform its business logic, and the database will guarantee the integrity of your historical record. For any system where the history of what was known and when is as important as the data itself, this pattern is the gold standard for building resilient, trustworthy software.