Bi-temporal Data Models in Postgres with Range Types & Constraints

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

Beyond SCD Type 2: True Bi-temporal Modeling in PostgreSQL

For senior engineers working on systems in regulated industries like finance, insurance, or healthcare, managing historical data is not a trivial task. The common "Slowly Changing Dimension" (SCD) Type 2 pattern, with its start_date and end_date columns, is often the first tool reached for. However, it suffers from a fundamental flaw: it conflates valid time (when a fact was true in the real world) with transaction time (when the fact was recorded in the database).

This conflation makes it impossible to answer critical questions simultaneously, such as:

  • "What was the price of Product A on January 15th?" (A query against valid time)
  • "On February 1st, what did our database believe the price of Product A was on January 15th?" (A query against transaction time)
  • Answering the second question is crucial for auditing, reproducing reports, and debugging historical discrepancies. When you correct a past mistake using SCD Type 2, you overwrite history, losing the record of the mistake ever existing. This is where a bi-temporal model excels.

    This article dives deep into a production-grade implementation of a bi-temporal data model using advanced PostgreSQL features. We will bypass simplistic approaches and focus on a robust, constraint-driven design using tsrange types and EXCLUDE USING GIST constraints. This provides declarative, database-enforced guarantees of temporal integrity, eliminating entire classes of bugs that plague trigger-based or application-level logic.

    The Core Schema: Separating Two Timelines

    A bi-temporal model explicitly separates the two time dimensions:

    * valid_time: A tsrange (timestamp with time zone range) representing the period during which the record's attributes were factually correct in the real world.

    * transaction_time: A tsrange representing the period during which this version of the record existed in the database. For the currently active version of a record, the upper bound of this range is typically 'infinity'.

    Let's model a common scenario: tracking product prices. A naive table might look like products(id, price). Our bi-temporal version will track price history in a separate table.

    First, ensure the btree_gist extension is enabled, as it's required for our exclusion constraint.

    sql
    CREATE EXTENSION IF NOT EXISTS btree_gist;

    Now, let's define the table. We'll use a product_price_id as a surrogate primary key, a product_id to link to the core product entity, the price, our two temporal ranges, and the crucial exclusion constraint.

    sql
    CREATE TABLE product_prices (
        product_price_id BIGSERIAL PRIMARY KEY,
        product_id BIGINT NOT NULL,
        price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
        valid_time TSRANGE NOT NULL,
        transaction_time TSRANGE NOT NULL,
    
        -- This is the core of our temporal integrity.
        -- It prevents any two rows for the same product from having
        -- overlapping valid_time AND overlapping transaction_time.
        EXCLUDE USING GIST (
            product_id WITH =,
            valid_time WITH &&,
            transaction_time WITH &&
        )
    );
    
    -- Create a GiST index to support the constraint and accelerate queries.
    -- Note: The EXCLUDE constraint automatically creates this index.
    -- We can also create more specific indexes later for performance.
    CREATE INDEX idx_product_prices_temporal ON product_prices USING GIST (product_id, valid_time, transaction_time);

    The EXCLUDE constraint is the lynchpin of this design. It's a generalization of a UNIQUE constraint. Here, it states that for any two rows in the table, the combination of (product_id = product_id) AND (valid_time && valid_time) AND (transaction_time && transaction_time) cannot all be true. The && is the "overlaps" operator for range types. This single constraint prevents temporal anomalies at the database level, such as defining two different prices for the same product to be valid at the same time and recorded in the database at the same time.

    Query Pattern: The 'As-Of' Query

    The most frequent query against a bi-temporal model is the "as-of" query, which asks for the state of the world at a specific point in time.

    Let's populate some data. Assume now() is 2023-10-26 10:00:00 UTC.

    sql
    -- Initial price for Product 123, valid starting today.
    INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
    VALUES (
        123,
        99.99,
        tsrange('2023-10-26 00:00:00', 'infinity', '[)'),
        tsrange(now(), 'infinity', '[)')
    );

    To find the current price of Product 123:

    sql
    -- What is the price of product 123 right now?
    SELECT product_id, price
    FROM product_prices
    WHERE product_id = 123
      AND valid_time @> now() -- The valid_time range contains the current timestamp
      AND transaction_time @> now(); -- The transaction_time range contains the current timestamp

    The @> operator checks if a range contains a specific element (in this case, a timestamp). This query is efficient because it can use our multi-column GiST index.

    Let's look at the EXPLAIN plan:

    text
    -- EXPLAIN ANALYZE SELECT ...
    Index Scan using idx_product_prices_temporal on product_prices
      Index Cond: ((product_id = 123) AND (valid_time @> '2023-10-26 10:00:00+00'::timestamptz) AND (transaction_time @> '2023-10-26 10:00:00+00'::timestamptz))
    ... (other details)

    The plan confirms an efficient index scan is used, which is critical for performance in large datasets.

    Mutation Patterns: The True Challenge

    Reading the data is straightforward. The real complexity lies in managing mutations while preserving history and satisfying the exclusion constraint. We'll encapsulate this logic in PL/pgSQL functions to ensure atomicity and correctness.

    Pattern 1: A Standard Update (Future Change)

    Let's say on 2023-11-01, we decide the price for Product 123 will change to 109.99 starting on 2023-12-01.

    This involves two steps performed atomically:

  • End the current validity: The existing price of 99.99 is no longer valid from 2023-12-01 onwards. We must update its valid_time range.
  • Insert the new price: A new row is created for the 109.99 price, with a valid_time starting 2023-12-01.
  • Crucially, this is a change to valid time, not transaction time. We are not correcting a mistake; we are recording a new, future state of the world. The transaction_time of the existing row remains open because it's still the current record in the database of what we believe the price history to be.

    Let's wrap this in a function.

    sql
    CREATE OR REPLACE FUNCTION update_product_price(
        p_product_id BIGINT,
        p_new_price NUMERIC(10, 2),
        p_effective_from TIMESTAMPTZ
    )
    RETURNS VOID AS $$
    DECLARE
        current_record RECORD;
    BEGIN
        -- Use a FOR UPDATE clause to lock the current active record for this product
        -- This prevents race conditions if two updates happen concurrently.
        SELECT * INTO current_record
        FROM product_prices
        WHERE product_id = p_product_id
          AND upper(valid_time) = 'infinity'
          AND upper(transaction_time) = 'infinity'
        FOR UPDATE;
    
        -- If no current record exists, or if the new price is the same, do nothing.
        IF NOT FOUND OR current_record.price = p_new_price THEN
            RETURN;
        END IF;
    
        -- Check if the effective date is in the past, which would be a correction,
        -- not a standard update. This function only handles future-dated changes.
        IF p_effective_from <= lower(current_record.valid_time) THEN
            RAISE EXCEPTION 'Effective date must be in the future. Use correction function for past changes.';
        END IF;
    
        -- 1. End the validity of the current price record.
        -- We are updating the valid_time, not the transaction_time.
        UPDATE product_prices
        SET valid_time = tsrange(lower(valid_time), p_effective_from, '[)')
        WHERE product_price_id = current_record.product_price_id;
    
        -- 2. Insert the new price record.
        -- Its validity starts where the old one ended.
        -- Its transaction time is the same as the old one, as this is one logical change.
        INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
        VALUES (p_product_id, p_new_price,
                tsrange(p_effective_from, 'infinity', '[)'),
                current_record.transaction_time);
    
    END;
    $$ LANGUAGE plpgsql;

    Usage:

    Let's assume now() is 2023-11-01 12:00:00 UTC.

    sql
    -- Schedule a price change for Dec 1st.
    SELECT update_product_price(123, 109.99, '2023-12-01 00:00:00 UTC');

    After this, the table for product_id = 123 will contain two rows, both with an open transaction_time:

  • price=99.99, valid_time=['2023-10-26 00:00:00', '2023-12-01 00:00:00'), transaction_time=['2023-10-26 10:00:00', 'infinity')
  • price=109.99, valid_time=['2023-12-01 00:00:00', 'infinity'), transaction_time=['2023-10-26 10:00:00', 'infinity')
  • The EXCLUDE constraint is satisfied because their valid_time ranges do not overlap.

    Pattern 2: A Correction of History (The Bi-temporal Superpower)

    This is where the bi-temporal model truly shines. Imagine on 2023-11-15, we discover that the initial price of 99.99 was a data entry error. It should have been 95.99 from the very beginning (2023-10-26).

    We cannot simply UPDATE the old record. That would erase the fact that we ever thought the price was 99.99. We need to preserve the audit trail. This is a change to transaction time.

    Here's the process, performed atomically at 2023-11-15 14:00:00 UTC:

  • Invalidate the incorrect records: Find all records for the product that are currently believed to be true (transaction_time is open) and whose valid_time overlaps with the period we are correcting. Set the upper bound of their transaction_time to now() (2023-11-15 14:00:00). These records are now considered historical from the database's perspective.
  • Insert the corrected records: Re-insert the historical data, but with the corrected values. These new rows will have the same valid_time as the records they are replacing, but their transaction_time will start from now().
  • This is a complex operation that absolutely must be encapsulated.

    sql
    CREATE OR REPLACE FUNCTION correct_product_price_history(
        p_product_id BIGINT,
        p_corrected_price NUMERIC(10, 2),
        p_correction_period TSRANGE
    )
    RETURNS VOID AS $$
    DECLARE
        rec RECORD;
        correction_time TIMESTAMPTZ := now();
    BEGIN
        -- Find all currently active versions that overlap with the correction period.
        -- Lock them to prevent concurrent modifications.
        FOR rec IN
            SELECT * FROM product_prices
            WHERE product_id = p_product_id
              AND upper(transaction_time) = 'infinity'
              AND valid_time && p_correction_period
            FOR UPDATE
        LOOP
            -- 1. Invalidate the old record by closing its transaction_time.
            UPDATE product_prices
            SET transaction_time = tsrange(lower(transaction_time), correction_time, '[)')
            WHERE product_price_id = rec.product_price_id;
    
            -- We now re-insert the segments of this invalidated record, some corrected,
            -- some as they were, with a new transaction_time.
    
            -- 2a. Insert the corrected version for the overlapping part.
            -- The new valid_time is the intersection of the old valid_time and the correction period.
            INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
            VALUES (rec.product_id, p_corrected_price,
                    rec.valid_time * p_correction_period, -- The '*' is the intersection operator
                    tsrange(correction_time, 'infinity', '[)'));
    
            -- 2b. If the original period extended before the correction, re-insert that part.
            IF lower(rec.valid_time) < lower(p_correction_period) THEN
                INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
                VALUES (rec.product_id, rec.price,
                        tsrange(lower(rec.valid_time), lower(p_correction_period), '[)'),
                        tsrange(correction_time, 'infinity', '[)'));
            END IF;
    
            -- 2c. If the original period extended after the correction, re-insert that part.
            IF upper(rec.valid_time) > upper(p_correction_period) THEN
                INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
                VALUES (rec.product_id, rec.price,
                        tsrange(upper(p_correction_period), upper(rec.valid_time), '[)'),
                        tsrange(correction_time, 'infinity', '[)'));
            END IF;
    
        END LOOP;
    END;
    $$ LANGUAGE plpgsql;

    Usage:

    On 2023-11-15 14:00:00 UTC, we run the correction.

    sql
    -- Correct the price to 95.99 for the period starting 2023-10-26.
    SELECT correct_product_price_history(
        123, 
        95.99, 
        tsrange('2023-10-26 00:00:00', 'infinity', '[)')
    );

    Now, let's see the power of this model.

    Query 1: What is the price today (2023-11-15)?

    sql
    SELECT price
    FROM product_prices
    WHERE product_id = 123
      AND valid_time @> '2023-11-15'::timestamptz
      AND transaction_time @> '2023-11-15 15:00:00'::timestamptz; -- After the correction
    
    -- Result: 95.99

    Query 2: What did we THINK the price was yesterday (2023-11-14)?

    sql
    SELECT price
    FROM product_prices
    WHERE product_id = 123
      AND valid_time @> '2023-11-15'::timestamptz -- The price valid on that day
      AND transaction_time @> '2023-11-14'::timestamptz; -- As known by the DB yesterday
    
    -- Result: 99.99

    This ability to query the database's state at any point in its own history is impossible with simpler models and is the core value proposition of a bi-temporal architecture.

    Performance Optimization and Indexing at Scale

    The default GiST index created by the EXCLUDE constraint is a good starting point, but for high-read workloads, we can do better.

    The Problem with GiST for 'Current State' Queries

    GiST indexes are effective for range-based queries ("overlaps", "contains"), but they are generally slower than B-Tree indexes for simple equality checks. A very common query is "get me the current price for all products".

    sql
    SELECT DISTINCT ON (product_id) product_id, price
    FROM product_prices
    WHERE valid_time @> now()
      AND transaction_time @> now()
    ORDER BY product_id, transaction_time DESC;

    While the GiST index can be used here, it may not be optimal, especially if the number of historical records per product is large. The query planner has to sift through all historical versions that match the index condition.

    Solution: Partial B-Tree Index

    We can create a highly efficient, smaller B-Tree index that only contains the currently active records. This is a classic PostgreSQL optimization pattern.

    sql
    CREATE INDEX idx_product_prices_current_active
    ON product_prices (product_id)
    WHERE upper(transaction_time) = 'infinity' AND upper(valid_time) = 'infinity';

    This index is a game-changer for 'current state' queries. Let's analyze a query that can leverage it:

    sql
    -- Get the current price for a specific product
    EXPLAIN ANALYZE
    SELECT price
    FROM product_prices
    WHERE product_id = 123
      AND upper(transaction_time) = 'infinity'
      AND upper(valid_time) = 'infinity';

    With the partial index, the query plan will switch to a much faster Index Scan on idx_product_prices_current_active, as the WHERE clause of the query is a perfect match for the index's predicate. This index is dramatically smaller than the main table, containing only one entry per product, making lookups extremely fast.

    Advanced Edge Cases and Considerations

    A production system must handle more than just the basics.

    Temporal Joins

    What if we have another bi-temporal table, say product_inventory(product_id, quantity, valid_time, transaction_time)? How do we answer: "Show me the price and inventory for all products as of last Tuesday at 5 PM?"

    This requires a join that respects both time dimensions.

    sql
    WITH as_of AS (SELECT '2023-10-24 17:00:00'::timestamptz AS moment)
    SELECT
        p.product_id,
        pr.price,
        inv.quantity
    FROM products p -- Assuming a main products table
    LEFT JOIN product_prices pr
        ON p.product_id = pr.product_id
       AND pr.valid_time @> (SELECT moment FROM as_of)
       AND pr.transaction_time @> (SELECT moment FROM as_of)
    LEFT JOIN product_inventory inv
        ON p.product_id = inv.product_id
       AND inv.valid_time @> (SELECT moment FROM as_of)
       AND inv.transaction_time @> (SELECT moment FROM as_of);

    This query structure correctly joins the state of each temporal table at the same point in both valid and transaction time. Performance here hinges on having effective GiST indexes on both tables.

    Enforcing Contiguity (No Gaps)

    The EXCLUDE constraint prevents overlaps, but it doesn't prevent gaps. For example, a product's price could be valid until 2023-12-01, and the next price might not start until 2023-12-05, leaving a 4-day gap where the price is undefined.

    Enforcing this business rule requires more than a declarative constraint. The most robust solution is to handle it within your mutation functions (update_product_price, etc.). Before committing a transaction, you can run a check:

    sql
    -- Inside a PL/pgSQL function, after a mutation
    PERFORM 1
    FROM product_prices p1
    LEFT JOIN product_prices p2 ON p1.product_id = p2.product_id
        AND p1.valid_time -|- p2.valid_time -- -|- is the 'adjacent' operator
    WHERE p1.product_id = p_product_id
      AND upper(p1.transaction_time) = 'infinity'
      AND upper(p2.transaction_time) = 'infinity'
      AND upper(p1.valid_time) <> 'infinity'
    GROUP BY p1.product_price_id
    HAVING count(p2.product_price_id) = 0;
    
    IF FOUND THEN
        RAISE EXCEPTION 'Gap detected in valid_time for product %', p_product_id;
    END IF;

    This query looks for any currently active record that has a finite end to its validity but does not have another active record adjacent to it. This check ensures a continuous timeline.

    Conclusion: A Robust Foundation for Auditable Systems

    Implementing a bi-temporal data model in PostgreSQL is a significant architectural decision. It introduces complexity into your mutation logic and requires a deep understanding of PostgreSQL's more advanced features. However, the payoff is immense for systems where data integrity and a complete, provable audit trail are paramount.

    By leveraging tsrange types and EXCLUDE USING GIST constraints, we move temporal logic from fragile application code or complex triggers into a declarative, database-enforced guarantee. This provides an incredibly robust foundation. When combined with carefully crafted PL/pgSQL functions for mutations and optimized indexing strategies like partial indexes, this pattern delivers both correctness and performance at scale. While not a fit for every problem, for the right domain, it is an exceptionally powerful and elegant solution to the complex challenge of managing data over time.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles