Bi-temporal Data Modeling in Postgres with Range Types

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Bi-temporal Challenge: Beyond Simple History

For any senior engineer who has worked on systems requiring a full audit trail—be it for financial compliance, insurance policy management, or scientific data tracking—the limitations of simple versioning become painfully clear. A typical approach involves adding start_date and end_date columns to a table. This is uni-temporal modeling; it tracks the history of a fact in the real world. This is known as Valid Time.

But what happens when you discover an error in a record entered last week? The fact was wrong from the moment it was recorded. A simple update overwrites the mistake, destroying the audit trail. You know the current state is correct, but you've lost the crucial information that your database previously contained an error.

This is where bi-temporal modeling becomes essential. It introduces a second time axis: Transaction Time.

* Valid Time (valid_time): The time period during which a fact is true in the real world. (e.g., a product's price was $99.99 from Jan 1st to Feb 15th).

* Transaction Time (transaction_time): The time period during which a fact is stored in the database. It represents the lifecycle of the database record itself.

By tracking both, you can answer not only "What was the price of Product X on February 1st?" but also the more complex, audit-critical question: "On March 1st, what did our system believe the price of Product X was on February 1st?"

Manually managing four separate timestamp columns (valid_from, valid_to, transaction_from, transaction_to) with application-level logic is a notorious source of bugs, race conditions, and complex, inefficient queries. Fortunately, PostgreSQL provides a powerful, elegant, and performant solution using two of its most advanced features: Range Types and Exclusion Constraints.

This article is a deep dive into building a production-grade bi-temporal data model in PostgreSQL, enforcing its integrity at the database level, and querying it efficiently.

The Core Schema: Enforcing Integrity with `EXCLUDE`

Let's model the price history for a set of products. Our goal is to ensure that for any given product, at any point in time, there is only one valid price recorded in the database. This translates to a bi-temporal constraint: the combination of product_id, valid_time, and transaction_time must not overlap.

PostgreSQL's EXCLUDE constraint is tailor-made for this. It's a generalization of a UNIQUE constraint, allowing you to specify that no two rows can have overlapping values for specified columns or expressions, using arbitrary operators.

Here is our core table definition:

sql
-- Ensure the btree_gist extension is available for the multi-column index
CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE product_prices (
    -- A surrogate primary key is still best practice for foreign key references.
    id BIGSERIAL PRIMARY KEY,

    -- The natural key for the entity we are tracking.
    product_id BIGINT NOT NULL,

    -- The actual data point we are versioning.
    price NUMERIC(10, 2) NOT NULL CHECK (price > 0),

    -- VALID TIME: When the price is effective in the real world.
    -- Using a timestamp range with time zone support.
    valid_time TSTZRANGE NOT NULL,

    -- TRANSACTION TIME: When this record was present in the database.
    -- An open-ended range signifies the 'current' record.
    transaction_time TSTZRANGE NOT NULL,

    -- The BI-TEMPORAL INTEGRITY CONSTRAINT
    EXCLUDE USING gist (
        product_id WITH =,
        valid_time WITH &&,
        transaction_time WITH &&
    )
);

-- Create the multi-column GiST index needed for performant lookups.
-- The EXCLUDE constraint above implicitly creates this, but being explicit is good for clarity.
-- If you didn't have the constraint, you would need this index for queries.
CREATE INDEX product_prices_bitemporal_idx ON product_prices USING gist (product_id, valid_time, transaction_time);

Let's dissect the EXCLUDE constraint—it's the heart of this entire pattern:

  • USING gist: Exclusion constraints require an index type that can handle the specified operators. The GiST (Generalized Search Tree) index is perfect for geometric and range types. It can efficiently answer questions like "find all ranges that overlap with this target range."
  • product_id WITH =: This part is crucial. We want the overlap constraint to apply per product. The = operator checks for standard equality. This means the constraint will only compare rows that have the same product_id.
  • valid_time WITH &&: The && operator is the 'overlaps' operator for range types. This part of the constraint says that for any two rows with the same product_id, their valid_time ranges cannot overlap.
  • transaction_time WITH &&: Similarly, this ensures that for any two rows with the same product_id, their transaction_time ranges also cannot overlap.
  • By combining these, the database now guarantees that you cannot insert a row that conflicts with an existing record for the same product in both time dimensions simultaneously. This is the bedrock of our bi-temporal integrity.

    Bi-temporal Operations: Beyond CRUD

    In a bi-temporal system, you don't perform standard INSERT, UPDATE, and DELETE operations. Every change is an append-only operation that preserves history. We'll encapsulate this complex logic in PL/pgSQL functions to ensure atomicity and correctness.

    Initial Insertion

    When a product price is first recorded, it's valid from that moment until some unknown future point, and it's being recorded now.

    sql
    -- Function to create the very first price record for a product.
    CREATE OR REPLACE FUNCTION create_initial_price(
        p_product_id BIGINT,
        p_price NUMERIC,
        p_valid_from TIMESTAMPTZ DEFAULT NOW()
    ) RETURNS BIGINT AS $$
    DECLARE
        new_price_id BIGINT;
    BEGIN
        INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
        VALUES (
            p_product_id,
            p_price,
            -- Valid from the specified time until 'forever'.
            tstzrange(p_valid_from, 'infinity', '[)'),
            -- Recorded in the DB from now until 'forever'.
            tstzrange(NOW(), 'infinity', '[)')
        )
        RETURNING id INTO new_price_id;
    
        RETURN new_price_id;
    END;
    $$ LANGUAGE plpgsql;

    Example Usage:

    sql
    -- The price for product 101 is $49.99, effective immediately.
    SELECT create_initial_price(101, 49.99);

    This creates a single row where both valid_time and transaction_time are open-ended ([now, infinity)).

    Correction: Fixing a Past Mistake

    A correction is needed when a previously recorded fact was wrong from the start. For example, the price of $49.99 was a typo; it should have been $45.99. We cannot simply update the old record, as that would erase the fact that the mistake ever existed.

    Instead, we must:

  • Terminate the transaction_time of the incorrect record.
  • Insert a new record with the corrected data. This new record will have the same valid_time as the erroneous one, but a new transaction_time starting from now().
  • This operation must be atomic.

    sql
    CREATE OR REPLACE FUNCTION correct_price(
        p_product_id BIGINT,
        p_incorrect_price NUMERIC,
        p_correct_price NUMERIC
    ) RETURNS BIGINT AS $$
    DECLARE
        erroneous_record RECORD;
        new_price_id BIGINT;
    BEGIN
        -- Find the current, active record that needs correction.
        -- We lock it to prevent concurrent modifications.
        SELECT * INTO erroneous_record
        FROM product_prices
        WHERE product_id = p_product_id
          AND price = p_incorrect_price
          AND upper(transaction_time) = 'infinity'
        FOR UPDATE;
    
        IF NOT FOUND THEN
            RAISE EXCEPTION 'No active record found for product % with price % to correct.', p_product_id, p_incorrect_price;
        END IF;
    
        -- 1. Terminate the transaction time of the incorrect record.
        UPDATE product_prices
        SET transaction_time = tstzrange(lower(transaction_time), NOW(), '[)')
        WHERE id = erroneous_record.id;
    
        -- 2. Insert the new, corrected record.
        -- It has the same valid_time but a new transaction_time.
        INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
        VALUES (
            erroneous_record.product_id,
            p_correct_price,
            erroneous_record.valid_time,
            tstzrange(NOW(), 'infinity', '[)')
        )
        RETURNING id INTO new_price_id;
    
        RETURN new_price_id;
    END;
    $$ LANGUAGE plpgsql;

    Example Usage:

    sql
    -- We realize the price for product 101 was a typo.
    -- It should have been $45.99 all along.
    SELECT correct_price(101, 49.99, 45.99);

    Now, a query for the current state will show $45.99. But a time-travel query looking at the database state before this correction will still show the original, incorrect price of $49.99. The audit trail is preserved.

    Update: A Real-World Change

    An update represents a legitimate change in the real world. The old price was correct for its time, but now there's a new price. For example, a planned price increase for product 101 from $45.99 to $55.00, effective next month.

    Here, we modify the valid_time, not the transaction_time.

  • Terminate the valid_time of the current record.
  • Insert a new record with the new price and a new valid_time starting from the effective date.
  • sql
    CREATE OR REPLACE FUNCTION update_price(
        p_product_id BIGINT,
        p_new_price NUMERIC,
        p_effective_from TIMESTAMPTZ
    ) RETURNS BIGINT AS $$
    DECLARE
        current_record RECORD;
        new_price_id BIGINT;
    BEGIN
        -- Find the currently valid and active record.
        SELECT * INTO current_record
        FROM product_prices
        WHERE product_id = p_product_id
          AND upper(valid_time) = 'infinity'
          AND upper(transaction_time) = 'infinity'
        FOR UPDATE;
    
        IF NOT FOUND THEN
            RAISE EXCEPTION 'No current, active record found for product % to update.', p_product_id;
        END IF;
    
        IF p_effective_from <= lower(current_record.valid_time) THEN
            RAISE EXCEPTION 'Effective date must be after the start of the current validity period.';
        END IF;
    
        -- 1. Terminate the valid_time of the old record.
        UPDATE product_prices
        SET valid_time = tstzrange(lower(valid_time), p_effective_from, '[)')
        WHERE id = current_record.id;
    
        -- 2. Insert the new record.
        -- It has the same transaction_time as it's part of the same logical 'now'.
        INSERT INTO product_prices (product_id, price, valid_time, transaction_time)
        VALUES (
            current_record.product_id,
            p_new_price,
            tstzrange(p_effective_from, 'infinity', '[)'),
            current_record.transaction_time
        )
        RETURNING id INTO new_price_id;
    
        RETURN new_price_id;
    END;
    $$ LANGUAGE plpgsql;

    Example Usage:

    sql
    -- Schedule a price increase for product 101 to $55.00, effective December 1st, 2024.
    SELECT update_price(101, 55.00, '2024-12-01 00:00:00-00');

    Deletion: Ending a Fact's Validity

    In a bi-temporal system, you rarely physically delete records. A "deletion" simply means a fact is no longer valid. This is identical to an update, but without a new record being inserted. We just terminate the valid_time of the current record.

    sql
    CREATE OR REPLACE FUNCTION discontinue_price(
        p_product_id BIGINT,
        p_discontinue_date TIMESTAMPTZ DEFAULT NOW()
    ) RETURNS VOID AS $$
    DECLARE
        current_record_id BIGINT;
    BEGIN
        -- Find the current, active record to terminate.
        SELECT id INTO current_record_id
        FROM product_prices
        WHERE product_id = p_product_id
          AND upper(valid_time) = 'infinity'
          AND upper(transaction_time) = 'infinity'
        FOR UPDATE;
    
        IF NOT FOUND THEN
            RAISE EXCEPTION 'No current, active record found for product % to discontinue.', p_product_id;
        END IF;
    
        -- Terminate the valid_time.
        UPDATE product_prices
        SET valid_time = tstzrange(lower(valid_time), p_discontinue_date, '[)')
        WHERE id = current_record_id;
    
    END;
    $$ LANGUAGE plpgsql;

    Advanced Querying: The Power of Time Travel

    With our data model and operations in place, we can now perform powerful historical queries that would be nightmarish with a manual setup.

    Let's populate some data to demonstrate:

    sql
    -- Time of initial creation: 2024-01-15
    SELECT create_initial_price(202, 99.99, '2024-01-15 09:00:00-00');
    
    -- Time of correction: 2024-01-20
    -- We realize the price should have been $95.00 from the start.
    SELECT correct_price(202, 99.99, 95.00);
    
    -- Time of update: 2024-02-10
    -- A planned price increase to $110.00, effective on 2024-03-01.
    SELECT update_price(202, 110.00, '2024-03-01 00:00:00-00');

    Query 1: "As-Of" (Current State)

    What is the price of product 202 right now? This is the most common query.

    sql
    SELECT product_id, price, valid_time
    FROM product_prices
    WHERE product_id = 202
      AND valid_time @> NOW()
      AND transaction_time @> NOW();

    The @> operator means 'contains'. We're looking for the record whose valid_time range contains the current timestamp AND whose transaction_time range also contains the current timestamp.

    Query 2: "As-Was-Known" (Time Travel)

    This is the quintessential bi-temporal query. On January 22nd, what did we think the price was for January 16th?

  • The state of the database we care about is from 2024-01-22. (transaction_time)
  • The real-world date we care about is 2024-01-16. (valid_time)
  • sql
    SELECT product_id, price
    FROM product_prices
    WHERE product_id = 202
      AND valid_time @> '2024-01-16 00:00:00-00'::timestamptz
      AND transaction_time @> '2024-01-22 00:00:00-00'::timestamptz;

    This query will correctly return $95.00. The correction happened on Jan 20th, so by Jan 22nd, the database knew the correct price.

    Now, let's ask: On January 18th, what did we think the price was for January 16th?

    sql
    SELECT product_id, price
    FROM product_prices
    WHERE product_id = 202
      AND valid_time @> '2024-01-16 00:00:00-00'::timestamptz
      AND transaction_time @> '2024-01-18 00:00:00-00'::timestamptz;

    This query returns the original, incorrect price of $99.99, because on Jan 18th, the correction had not yet been made. This is the power of a true audit trail.

    Query 3: History of a Fact

    How did our knowledge about the price on March 15th evolve over time?

    sql
    SELECT price, transaction_time
    FROM product_prices
    WHERE product_id = 202
      AND valid_time @> '2024-03-15 00:00:00-00'::timestamptz
    ORDER BY lower(transaction_time);

    This will show every version of the price that was considered valid for March 15th, and when each version was recorded in the database.

    Performance, Indexing, and Edge Cases

    This model is powerful, but its performance in a production environment with millions or billions of rows depends entirely on proper indexing and an understanding of its limitations.

    The Multi-Column GiST Index is Non-Negotiable

    The EXCLUDE constraint creates a multi-column GiST index on (product_id, valid_time, transaction_time). This index is what makes the time-travel queries efficient. Let's examine the query plan for our "As-Was-Known" query:

    sql
    EXPLAIN ANALYZE
    SELECT product_id, price
    FROM product_prices
    WHERE product_id = 202
      AND valid_time @> '2024-01-16 00:00:00-00'::timestamptz
      AND transaction_time @> '2024-01-22 00:00:00-00'::timestamptz;

    The output will look something like this:

    text
    Index Scan using product_prices_bitemporal_idx on product_prices ...
      Index Cond: ((product_id = 202) AND (valid_time @> '...'::timestamptz) AND (transaction_time @> '...'::timestamptz))
      ...

    The key is Index Scan. The planner uses the GiST index to rapidly narrow down the search. It first filters by product_id (an exact match) and then uses the tree structure to efficiently find ranges that contain the target timestamps. Without this index, the database would have to perform a full table scan, which is untenable on large datasets.

    Handling Unbounded Ranges (`infinity`)

    A huge percentage of your rows will have upper(transaction_time) = 'infinity'. This skew can, in some older PostgreSQL versions, lead to suboptimal query plans if statistics are not up-to-date. Running ANALYZE product_prices regularly is critical. Modern PostgreSQL versions have improved planners that handle this common pattern more gracefully.

    Edge Case: Enforcing Contiguity

    The EXCLUDE constraint prevents overlaps, but it does not prevent gaps. For example, you could have a price valid from Jan 1st to Feb 1st, and the next price starting on Feb 3rd, leaving Feb 2nd with no defined price.

    If your business logic requires strict contiguity, this must be enforced separately. A common approach is to use a BEFORE UPDATE OR INSERT trigger. The trigger would find the preceding and succeeding records for the same product_id and ensure that the new or updated range perfectly abuts its neighbors.

    Database Partitioning for Scale

    For truly massive bi-temporal tables (terabytes of data), you will need table partitioning. The most common strategy is to partition by product_id (or another natural key like customer_id).

    * Partitioning by LIST (product_id) or HASH (product_id): This is often the best choice. Queries almost always filter by product_id, so the planner can prune partitions and only scan the single partition relevant to that product. This scales writes and reads beautifully.

    * Partitioning by RANGE (lower(valid_time)): This can be useful for queries that span many products but are limited to a specific time window (e.g., "show all price changes last month"). However, it can create a hot partition for writes, as all current records will go into the latest partition.

    Choosing the right partitioning key is highly dependent on your specific query patterns.

    Conclusion: Database-Enforced Veracity

    Implementing a bi-temporal data model is a significant architectural decision. While it introduces complexity in your write operations, the benefits for systems requiring auditability and historical accuracy are immense.

    By leveraging PostgreSQL's native TSTZRANGE and EXCLUDE USING gist constraints, you shift the burden of data integrity from fallible application code to the database itself. This provides several key advantages:

  • Guaranteed Correctness: Race conditions and application bugs cannot corrupt your historical data. The EXCLUDE constraint is an unbreakable guarantee against overlapping states.
  • Simplified Queries: Complex time-travel queries become declarative and readable, using intuitive operators like @> (contains) and && (overlaps).
  • High Performance: When backed by the proper multi-column GiST index, these queries are highly performant, even on very large datasets.
  • This pattern is not for every application. But for senior engineers designing systems in regulated industries or any domain where the history of what was known and when is as important as the history of what was true, the PostgreSQL bi-temporal model is a robust, scalable, and elegant solution.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles