Bi-Temporal Data Modeling in Postgres with Range Types

19 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Shortcomings of Traditional Auditing

As senior engineers, we've all implemented audit trails. Typically, this involves a products_audit table populated by a trigger on the products table, capturing the old row, the new row, a timestamp, and the user who made the change. This is sufficient for answering the question: "What changed, when, and by whom?"

However, this approach fails spectacularly when faced with more complex temporal queries. Consider these requirements:

  • "As-Was" Query: What was the price of product 123 on February 15th, 2023?
  • Correction Query: We discovered on July 1st that the price recorded from February 1st to February 28th was incorrect due to a data entry error. We need to correct the historical record without losing the fact that we used to think the price was different.
  • True Bi-Temporal Query: What did our system report as the price for product 123 on February 15th, when we ran the report on March 1st? And what does it report for that same date now, after our July 1st correction?
  • Simple audit logs make answering these questions a nightmare of procedural code, complex joins, and window functions over a sea of timestamped change records. The data model itself doesn't inherently understand time; it only records discrete events.

    Bi-temporal modeling solves this by treating time as a first-class citizen in the data model. We track two distinct time dimensions:

    * Valid Time (valid_time): The time period during which a fact is true in the real world. For example, a product's price is $99.99 from 2023-01-01 to 2023-06-30.

    * Transaction Time (transaction_time): The time period during which a fact is stored in the database. This is the system's view of reality. A record is considered current as long as its transaction time's upper bound is 'infinity'.

    By modeling both, we can accurately reconstruct the state of our data from any point in valid time, as it was known at any point in transaction time. PostgreSQL, with its powerful tsrange (timestamp range) type, GIST indexes, and exclusion constraints, provides the perfect toolkit to implement this pattern elegantly and efficiently at the database layer.


    The Bi-Temporal Schema Foundation

    Let's abandon the separate audit table pattern. Instead, we'll create a single table that holds all states of our entity—past, present, and future. We will not have a separate products table; the products_history table will be the single source of truth.

    Our schema will manage the history of product attributes. A stable, business-facing ID will link all historical versions of a single product.

    sql
    -- Enable the btree_gist extension to allow exclusion constraints on base types
    CREATE EXTENSION IF NOT EXISTS btree_gist;
    
    CREATE TABLE products_history (
        -- A surrogate primary key for the history row itself.
        history_id bigserial PRIMARY KEY,
    
        -- The stable, business identifier for a product. This links all versions.
        product_id uuid NOT NULL,
    
        -- The actual state of the product at a point in time.
        -- Using JSONB is flexible for evolving schemas.
        product_data jsonb NOT NULL,
    
        -- VALID TIME: When the fact was true in the real world.
        -- The range is inclusive of the lower bound and exclusive of the upper bound '[)'.
        valid_time tstzrange NOT NULL,
    
        -- TRANSACTION TIME: When this fact was recorded in the database.
        -- This allows us to see what the database "knew" and when.
        transaction_time tstzrange NOT NULL,
    
        -- The magic happens here. This constraint ensures that for any given product_id,
        -- there can be no overlapping valid_time ranges for records that are currently active
        -- in the database (i.e., their transaction_time is open-ended).
        EXCLUDE USING gist (
            product_id WITH =,
            valid_time WITH &&
        ) WHERE (upper(transaction_time) IS NULL OR upper(transaction_time) = 'infinity')
    );
    
    -- Create indexes to support all types of temporal queries.
    -- This multi-column GIST index is crucial for performance.
    CREATE INDEX products_history_gist_idx ON products_history USING gist (product_id, valid_time, transaction_time);
    
    -- A b-tree index is still useful for looking up all versions of a single product.
    CREATE INDEX products_history_product_id_idx ON products_history (product_id);

    Deconstructing the Schema and Constraints

    * product_id (UUID): We use a non-sequential UUID as the stable identifier. This is the ID you would expose to other services. The history_id is purely an internal implementation detail.

    * product_data (JSONB): Storing the state in a JSONB column provides schema flexibility. You could also use regular columns, but for entities with many frequently changing attributes, JSONB simplifies the management functions.

    * valid_time (tstzrange): This is the core of the "as-was" capability. We use tstzrange to represent the interval. A price change on July 1st would end the previous record's valid_time at 2023-07-01 00:00:00 and start the new one at the same time.

    * transaction_time (tstzrange): This is our system's timeline. When a record is created, its transaction_time is [now(), infinity). When we update it, we don't change the row; we "close" its transaction time by setting the upper bound to now() and insert a new row.

    * The Exclusion Constraint: This is the most critical piece for data integrity. EXCLUDE USING gist (product_id WITH =, valid_time WITH &&) WHERE (...) tells PostgreSQL:

    * For any rows with the same product_id (WITH =)

    * Their valid_time ranges cannot overlap (WITH &&)

    Crucially, this constraint only* applies to records that are currently active (WHERE (upper(transaction_time) IS NULL OR upper(transaction_time) = 'infinity')).

    This WHERE clause is subtle but vital. It allows us to have overlapping valid_time ranges in our history, which is necessary for making corrections. For example, we can have two records for the same product valid from Feb 1st to Feb 28th, as long as only one of them is currently active in transaction time. The other represents a historical mistake that has since been corrected.


    Implementing Core Operations with PL/pgSQL

    Directly manipulating this table with raw INSERT and UPDATE statements is error-prone. The logic for managing the time ranges should be encapsulated in database functions to ensure atomicity and correctness.

    1. Creating a New Product

    This is the simplest operation. A new product is valid from its creation time until forever, and it's recorded in the database from its creation time until forever.

    sql
    CREATE OR REPLACE FUNCTION create_product(
        p_product_id uuid,
        p_initial_data jsonb
    ) RETURNS void AS $$
    DECLARE
        v_now timestamptz := now();
    BEGIN
        INSERT INTO products_history (product_id, product_data, valid_time, transaction_time)
        VALUES (
            p_product_id,
            p_initial_data,
            tstzrange(v_now, 'infinity', '[)'),
            tstzrange(v_now, 'infinity', '[)')
        );
    END;
    $$ LANGUAGE plpgsql;

    Usage:

    sql
    -- Create a new product
    SELECT create_product(
        'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11',
        '{"name": "Quantum Entangler", "price": 999.99, "stock": 10}'
    );

    2. Updating a Product (Standard Change)

    This is the most common operation. An update signifies that the real-world state of the product has changed now. We must:

  • Find the currently active record for the given product_id.
  • "Close" both its valid_time and transaction_time by setting their upper bounds to now().
  • Insert a new record with the updated data, where both valid_time and transaction_time start from now() and extend to infinity.
  • This must happen atomically.

    sql
    CREATE OR REPLACE FUNCTION update_product(
        p_product_id uuid,
        p_new_data jsonb
    ) RETURNS void AS $$
    DECLARE
        v_now timestamptz := now();
        v_current_history_id bigint;
        v_current_valid_time tstzrange;
    BEGIN
        -- Find the current active record and lock it for update
        SELECT history_id, valid_time
        INTO v_current_history_id, v_current_valid_time
        FROM products_history
        WHERE product_id = p_product_id
          AND upper(transaction_time) = 'infinity'
        FOR UPDATE;
    
        IF v_current_history_id IS NULL THEN
            RAISE EXCEPTION 'Product with ID % not found or is not active.', p_product_id;
        END IF;
    
        -- 1. Close the current record's transaction and valid time
        UPDATE products_history
        SET
            transaction_time = tstzrange(lower(transaction_time), v_now, '[)'),
            valid_time = tstzrange(lower(valid_time), v_now, '[)')
        WHERE history_id = v_current_history_id;
    
        -- 2. Insert the new version of the record
        INSERT INTO products_history (product_id, product_data, valid_time, transaction_time)
        VALUES (
            p_product_id,
            p_new_data,
            tstzrange(v_now, 'infinity', '[)'),
            tstzrange(v_now, 'infinity', '[)')
        );
    END;
    $$ LANGUAGE plpgsql;

    Usage:

    sql
    -- Update the price of our product
    SELECT update_product(
        'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11',
        '{"name": "Quantum Entangler", "price": 1299.99, "stock": 8}'
    );

    3. Deleting a Product (Temporal Deletion)

    We never use the DELETE statement. A "deletion" simply means the product is no longer valid from this point forward. We end its valid_time.

    sql
    CREATE OR REPLACE FUNCTION delete_product(
        p_product_id uuid
    ) RETURNS void AS $$
    DECLARE
        v_now timestamptz := now();
        v_current_history_id bigint;
    BEGIN
        -- Find and lock the current active record
        SELECT history_id
        INTO v_current_history_id
        FROM products_history
        WHERE product_id = p_product_id
          AND upper(transaction_time) = 'infinity'
          AND upper(valid_time) = 'infinity'
        FOR UPDATE;
    
        IF v_current_history_id IS NULL THEN
            RAISE EXCEPTION 'Product with ID % not found or already deleted.', p_product_id;
        END IF;
    
        -- Close the valid_time and transaction_time of the current record
        UPDATE products_history
        SET
            valid_time = tstzrange(lower(valid_time), v_now, '[)'),
            transaction_time = tstzrange(lower(transaction_time), v_now, '[)')
        WHERE history_id = v_current_history_id;
    END;
    $$ LANGUAGE plpgsql;

    4. Correcting Historical Data (The Advanced Case)

    This is where bi-temporal modeling truly shines. Let's say we discovered that a price was wrong for a period in the past. We need to rewrite history while preserving the original, incorrect record for auditing.

    Scenario: Today is July 1st. We learn the price for product 'a0eebc...' was actually $1100.00 from February 15th to March 10th, not $999.99 as recorded.

    The process is complex:

  • Find all historical records that are affected by this correction (i.e., their valid_time overlaps with the correction period).
  • "Deactivate" these records by closing their transaction_time at now().
  • Insert new, corrected records. These new records will have valid_time ranges corresponding to the corrected period, but their transaction_time will start from now(), indicating when we made the correction.
  • This logic is too complex for a single function and often requires careful handling in application code or a more sophisticated stored procedure. Here is a conceptual implementation:

    sql
    CREATE OR REPLACE FUNCTION correct_product_history(
        p_product_id uuid,
        p_correction_data jsonb,
        p_correction_valid_time tstzrange
    ) RETURNS void AS $$
    DECLARE
        v_now timestamptz := now();
        v_affected_record record;
        v_new_valid_time tstzrange;
    BEGIN
        -- This transaction block is critical
        BEGIN
            -- Loop through all currently active records whose validity overlaps with the correction period
            FOR v_affected_record IN
                SELECT * FROM products_history
                WHERE product_id = p_product_id
                  AND valid_time && p_correction_valid_time
                  AND upper(transaction_time) = 'infinity'
                ORDER BY valid_time
                FOR UPDATE
            LOOP
                -- 1. Deactivate the old, incorrect record by closing its transaction time
                UPDATE products_history
                SET transaction_time = tstzrange(lower(transaction_time), v_now, '[)')
                WHERE history_id = v_affected_record.history_id;
    
                -- 2. Now, re-insert the historical record(s) with corrections.
                -- This is complex because the correction period might split an existing record,
                -- requiring us to insert up to three new records:
                --   a. The part before the correction
                --   b. The corrected part
                --   c. The part after the correction
    
                -- Part before (if any)
                IF lower(v_affected_record.valid_time) < lower(p_correction_valid_time) THEN
                    INSERT INTO products_history(product_id, product_data, valid_time, transaction_time)
                    VALUES (p_product_id, v_affected_record.product_data,
                            tstzrange(lower(v_affected_record.valid_time), lower(p_correction_valid_time), '[)'),
                            tstzrange(v_now, 'infinity', '[)'));
                END IF;
    
                -- The corrected part
                -- We calculate the intersection of the affected record's validity and the correction period
                v_new_valid_time := v_affected_record.valid_time * p_correction_valid_time;
                INSERT INTO products_history(product_id, product_data, valid_time, transaction_time)
                VALUES (p_product_id, p_correction_data,
                        v_new_valid_time,
                        tstzrange(v_now, 'infinity', '[)'));
    
                -- Part after (if any)
                IF upper(v_affected_record.valid_time) > upper(p_correction_valid_time) THEN
                    INSERT INTO products_history(product_id, product_data, valid_time, transaction_time)
                    VALUES (p_product_id, v_affected_record.product_data,
                            tstzrange(upper(p_correction_valid_time), upper(v_affected_record.valid_time), '[)'),
                            tstzrange(v_now, 'infinity', '[)'));
                END IF;
    
            END LOOP;
        EXCEPTION
            WHEN OTHERS THEN
                -- Handle errors, log, and re-raise
                RAISE INFO 'An error occurred during correction: %', SQLERRM;
                RAISE;
        END;
    END;
    $$ LANGUAGE plpgsql;

    Note: This correction function is highly complex and demonstrates the logical gymnastics required. In a real-world system, you might break this down further. The key takeaway is the pattern: end the transaction time of the old record and insert new ones with a new transaction time but old/corrected valid times.


    Querying Bi-Temporal Data: Unlocking History

    Now for the payoff. With our robust data model, we can answer those difficult temporal questions with surprisingly simple queries.

    Let's assume our products_history table has records from the operations above.

    Query 1: What is the current state of a product? (As-Of Now)

    This is the most common query. It's what you'd use to display product information on a website. We look for the record that is valid now and was recorded now (i.e., its time ranges extend to infinity).

    sql
    SELECT product_id, product_data
    FROM products_history
    WHERE product_id = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'
      AND upper(valid_time) = 'infinity'
      AND upper(transaction_time) = 'infinity';

    Query 2: What was the product's state on a specific date? (As-Was)

    This query asks for the state in the real world on a past date, according to our current understanding.

    "What was the price on 2023-02-20?"

    sql
    SELECT product_id, product_data->>'price' as price
    FROM products_history
    WHERE product_id = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'
      AND valid_time @> '2023-02-20'::timestamptz
      AND upper(transaction_time) = 'infinity'; -- Use the current view of history

    The @> operator checks if the tstzrange contains the specified timestamp.

    Query 3: What did we *think* the state was on a specific date, based on what the database knew at another date? (True Bi-Temporal)

    This is the ultimate test. "On 2023-03-01, what would our system have reported as the price for 2023-02-20?"

    This query is crucial for reproducing old reports, debugging historical discrepancies, and meeting strict regulatory audit requirements.

    sql
    SELECT product_id, product_data->>'price' as price
    FROM products_history
    WHERE product_id = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'
      -- The fact must have been valid in the real world on this date
      AND valid_time @> '2023-02-20'::timestamptz
      -- And the record of that fact must have existed in our database on this date
      AND transaction_time @> '2023-03-01'::timestamptz;

    If a correction was made on July 1st, this query (run today) would show the old, incorrect price, because that's what the database contained on March 1st. Running the same query but with a transaction_time of now() would show the new, corrected price.


    Performance, Indexing, and Production Considerations

    This pattern is powerful, but it comes with performance overhead and operational complexity.

    Indexing Strategy

    The multi-column GIST index is the workhorse of this model:

    CREATE INDEX products_history_gist_idx ON products_history USING gist (product_id, valid_time, transaction_time);

    A GIST index can accelerate queries involving range operators like && (overlaps) and @> (contains). By including product_id first, we allow the planner to quickly narrow down the search to a single product before evaluating the temporal conditions. An EXPLAIN ANALYZE on our bi-temporal query will show a much more efficient Index Scan or Bitmap Heap Scan using this index compared to a full Seq Scan.

    sql
    EXPLAIN ANALYZE
    SELECT product_id, product_data->>'price' as price
    FROM products_history
    WHERE product_id = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'
      AND valid_time @> '2023-02-20'::timestamptz
      AND transaction_time @> '2023-03-01'::timestamptz;

    Table Bloat and Partitioning

    A history table will, by definition, grow indefinitely. This leads to table and index bloat, which can degrade performance over time. For high-throughput systems, you must plan for this.

    Partitioning is the standard solution. You can partition the products_history table by product_id (e.g., PARTITION BY HASH(product_id)) to spread the load. More effectively, you can partition by time, typically transaction_time:

    sql
    CREATE TABLE products_history (
        -- ... columns ...
    ) PARTITION BY RANGE (lower(transaction_time));
    
    -- Create partitions for different time periods
    CREATE TABLE products_history_2023 PARTITION OF products_history
        FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
    CREATE TABLE products_history_2024 PARTITION OF products_history
        FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

    This allows the query planner to perform partition pruning, ignoring entire partitions that don't match the query's time constraints. It also makes archiving or deleting very old data much more efficient (you can DROP or DETACH a partition).

    Creating a `current_products` View

    Most application queries will only care about the current state. Constantly adding AND upper(valid_time) = 'infinity' AND upper(transaction_time) = 'infinity' is cumbersome and potentially inefficient if the planner makes a mistake. Create a view:

    sql
    CREATE OR REPLACE VIEW products_current AS
    SELECT
        product_id,
        product_data,
        lower(valid_time) as valid_from,
        lower(transaction_time) as created_at
    FROM products_history
    WHERE upper(valid_time) = 'infinity'
      AND upper(transaction_time) = 'infinity';

    Now, your application can simply SELECT * FROM products_current WHERE product_id = ... for standard operations. For even better performance, you could use a materialized view and refresh it periodically or via a trigger, trading some data freshness for raw read speed.

    Edge Cases

    * Timezones: Always use timestamptz (timestamp with time zone). All inputs should be in UTC or have an explicit timezone to avoid catastrophic ambiguity.

    * Clock Skew: In a distributed system, now() can differ between application servers and the database. It is highly recommended to use the database's now() or statement_timestamp() as the single source of truth for time, as shown in the PL/pgSQL functions.

    * Schema Migrations: Adding a new field to product_data is easy. Backfilling that field for all of history is hard. You must decide on a strategy: either backfill with a default value (potentially creating new historical records with a new transaction time) or allow the field to be null in older records and handle it in your application logic.

    Conclusion

    Implementing a bi-temporal data model in PostgreSQL is a significant architectural decision. It introduces complexity into your write path and requires a disciplined approach to data management. However, for systems where full auditability and point-in-time analysis are non-negotiable—such as in finance, insurance, regulatory compliance, or large-scale e-commerce—it is an immensely powerful and robust pattern.

    By leveraging native PostgreSQL features like tstzrange, GIST indexes, and exclusion constraints, you can enforce complex temporal business rules directly within the database, ensuring data integrity and providing a clean, declarative API for querying the multifaceted nature of time. This moves the burden of historical reconstruction from fragile application code into a solid, performant, and reliable database foundation.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles