Bi-temporal Data Modeling in PostgreSQL with Range Types & Constraints

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Auditability Mandate: Why Simple Versioning Fails

In many domains—finance, insurance, healthcare, logistics—maintaining a simple updated_at timestamp or a basic history table is insufficient. These systems require not just a record of an entity's state over time, but also a record of when the system knew about those states. This two-dimensional time tracking is known as bi-temporality.

Consider a financial pricing system. A pricing error for a stock on 2023-12-15 is discovered and corrected on 2024-01-10. Your system must be able to answer two distinct but critical questions:

  • "As-At" Query: What was the correct price of the stock on 2023-12-15? (The corrected value)
  • "As-Of" Query: What did the system report as the price on 2023-12-15 if we had queried it on 2023-12-20? (The original, incorrect value)
  • A standard history table, which typically overwrites or appends versions, can usually answer the first question but fundamentally cannot answer the second. It loses the history of the system's own knowledge. Bi-temporal modeling solves this by tracking two time axes:

    * Valid Time (valid_ts): The period in the real world when a fact was true. (e.g., The price was $100 from 2023-12-15 to 2023-12-16)

    * Transaction Time (tx_ts): The period during which the database recorded and believed the fact to be true. (e.g., We believed this fact from 2023-12-15 09:30:00 until it was corrected on 2024-01-10 14:00:00)

    This article details a robust, constraint-enforced implementation of a bi-temporal data model using advanced PostgreSQL features, moving beyond theoretical discussions to production-ready patterns.


    Schema Design: Leveraging Range Types and Exclusion Constraints

    The naive approach of using four timestamp columns (valid_from, valid_to, tx_from, tx_to) is prone to complex application logic and race conditions. PostgreSQL provides a more elegant and powerful toolkit: TSRANGE types and EXCLUDE USING GIST constraints.

    Our canonical example will be a product_prices table. A product has a specific price for a given period (its valid time).

    The Core Table Structure

    sql
    -- Ensure the btree_gist extension is available for our constraints
    CREATE EXTENSION IF NOT EXISTS btree_gist;
    
    CREATE TABLE product_prices (
        -- A surrogate key for this specific version/row. This is the primary key.
        version_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
        -- A stable identifier for the logical entity (the product).
        -- All versions of a single product's price share the same product_id.
        product_id UUID NOT NULL,
    
        -- The payload data.
        price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
        currency CHAR(3) NOT NULL,
    
        -- == Bi-temporal Axes ==
    
        -- Valid Time: When was this price effective in the real world?
        -- Using tsrange for a continuous period. '[)' means inclusive start, exclusive end.
        valid_ts TSRANGE NOT NULL,
    
        -- Transaction Time: When did the database believe this fact?
        -- This range starts when the record is inserted and extends to 'infinity'.
        -- When corrected/superseded, the upper bound is set to the current time.
        tx_ts TSTZRANGE NOT NULL DEFAULT TSTZRANGE(NOW(), 'infinity', '[)'),
    
        -- == The Magic: Ensuring Data Integrity at the DB Level ==
    
        -- This constraint is the cornerstone of our bi-temporal model.
        -- For any given product_id, you cannot have overlapping valid_ts ranges
        -- for records that are *currently active* in transaction time.
        CONSTRAINT no_overlapping_valid_time_for_current_records
        EXCLUDE USING GIST (
            product_id WITH =,
            valid_ts WITH &&
        ) WHERE (tx_ts @> 'infinity'::timestamptz)
    );
    
    -- Indexes are critical for performance of bi-temporal queries.
    -- A GiST index is needed to support the exclusion constraint and range operators.
    CREATE INDEX idx_product_prices_valid_ts_gist ON product_prices USING GIST (product_id, valid_ts);
    
    -- A standard B-tree index on transaction time is also essential for "as-of" queries.
    CREATE INDEX idx_product_prices_tx_ts ON product_prices USING GIST (product_id, tx_ts);

    Deconstructing the `EXCLUDE` Constraint

    This constraint is the most critical piece of the schema. Let's break it down:

    EXCLUDE USING GIST (product_id WITH =, valid_ts WITH &&)

    * USING GIST: This specifies that a GiST index is required to efficiently check the constraint.

    product_id WITH =: The constraint is applied on a per-product basis. Overlapping prices for different* products are perfectly acceptable.

    * valid_ts WITH &&: This is the core logic. The && operator for ranges checks for overlaps. The constraint will fail if a new or updated row has a valid_ts that overlaps with an existing row's valid_ts (for the same product_id).

    WHERE (tx_ts @> 'infinity'::timestamptz)

    This is the advanced pattern that makes the bi-temporal model work. Without it, the constraint would prevent any overlapping valid_ts ranges, even for historical corrections. This partial constraint applies only to rows that are currently considered true by the system—those whose transaction time range extends to infinity.

    This elegantly enforces our primary business rule: "At any given point in time, a product can only have one price."


    Bi-temporal Operations: Beyond Standard CRUD

    Standard INSERT, UPDATE, and DELETE commands break the bi-temporal model because they destroy information. We must instead use a set of carefully crafted procedures that preserve history.

    All operations should be wrapped in atomic PL/pgSQL functions to prevent race conditions and ensure data integrity.

    1. Creating a New Logical Entity

    This is the simplest operation. It's a direct INSERT that establishes the first version of a product's price.

    sql
    CREATE OR REPLACE FUNCTION create_new_product_price(
        p_product_id UUID,
        p_price NUMERIC(10, 2),
        p_currency CHAR(3),
        p_valid_from TIMESTAMPTZ
    ) RETURNS UUID AS $$
    DECLARE
        v_version_id UUID;
    BEGIN
        INSERT INTO product_prices (product_id, price, currency, valid_ts)
        VALUES (p_product_id, p_price, p_currency, TSRANGE(p_valid_from, 'infinity', '[)'))
        RETURNING version_id INTO v_version_id;
    
        RETURN v_version_id;
    END;
    $$ LANGUAGE plpgsql;

    Usage:

    sql
    -- Set the initial price for a new product, valid from today onwards.
    SELECT create_new_product_price('a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d', 199.99, 'USD', NOW());

    2. Updating a Price (Logical Update)

    An "update" is not an UPDATE statement in the traditional sense. It involves two steps:

  • Closing the transaction time of the current record.
  • Inserting a new record with the new data.
  • This preserves the old record as a historical fact that was once believed to be true.

    sql
    CREATE OR REPLACE FUNCTION update_product_price(
        p_product_id UUID,
        p_new_price NUMERIC(10, 2),
        p_effective_from TIMESTAMPTZ
    ) RETURNS UUID AS $$
    DECLARE
        v_current_version RECORD;
        v_new_version_id UUID;
    BEGIN
        -- Use a SERIALIZABLE transaction to prevent race conditions where two updates
        -- might try to supersede the same record simultaneously.
        SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    
        -- Find the current active version that we need to supersede.
        -- We lock it to ensure no other transaction can modify it.
        SELECT * INTO v_current_version
        FROM product_prices
        WHERE product_id = p_product_id
          AND valid_ts @> p_effective_from
          AND tx_ts @> 'infinity'::timestamptz
        FOR UPDATE;
    
        IF NOT FOUND THEN
            RAISE EXCEPTION 'No active price found for product % at effective date %', p_product_id, p_effective_from;
        END IF;
    
        -- 1. Close the transaction time of the old record.
        -- Its valid time is split at the effective date of the new price.
        UPDATE product_prices
        SET tx_ts = TSTZRANGE(lower(tx_ts), NOW(), '[)')
        WHERE version_id = v_current_version.version_id;
    
        -- 2. If the old price was valid for a period, we need to re-insert the part
        -- of its validity that occurred *before* the new price took effect.
        IF lower(v_current_version.valid_ts) < p_effective_from THEN
            INSERT INTO product_prices(product_id, price, currency, valid_ts, tx_ts)
            VALUES (
                v_current_version.product_id,
                v_current_version.price,
                v_current_version.currency,
                TSRANGE(lower(v_current_version.valid_ts), p_effective_from, '[)'),
                TSTZRANGE(NOW(), 'infinity', '[)')
            );
        END IF;
    
        -- 3. Insert the new price record, valid from the effective date onwards.
        INSERT INTO product_prices(product_id, price, currency, valid_ts, tx_ts)
        VALUES (
            p_product_id,
            p_new_price,
            v_current_version.currency, -- Assuming currency doesn't change
            TSRANGE(p_effective_from, 'infinity', '[)'),
            TSTZRANGE(NOW(), 'infinity', '[)')
        )
        RETURNING version_id INTO v_new_version_id;
    
        RETURN v_new_version_id;
    END;
    $$ LANGUAGE plpgsql;

    Usage:

    sql
    -- On 2024-02-15, we decide the price for product 'a1b2...' will change to $219.99 starting 2024-03-01
    SELECT update_product_price('a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d', 219.99, '2024-03-01'::timestamptz);

    This operation is complex because it splits the existing validity range. The original record (price: 199.99, valid_ts: [2024-01-01, infinity)) is terminated. Two new records are created:

  • A reassertion of the old price: price: 199.99, valid_ts: [2024-01-01, 2024-03-01)
  • The new price: price: 219.99, valid_ts: [2024-03-01, infinity)
  • Both new records have a tx_ts starting from 2024-02-15. The database now correctly records that on and after 2024-02-15, we knew about the upcoming price change on 2024-03-01.

    3. Deleting a Price (Logical Deletion)

    A "delete" simply terminates the transaction time of the current record. The data is never removed.

    sql
    CREATE OR REPLACE FUNCTION delete_product_price(
        p_product_id UUID,
        p_effective_from TIMESTAMPTZ
    ) RETURNS VOID AS $$
    BEGIN
        SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    
        -- This logic is similar to the update. We find the active record,
        -- terminate its transaction time, and re-insert the portion of its
        -- validity before the deletion date.
        WITH current_version AS (
            SELECT version_id, valid_ts, tx_ts, product_id, price, currency
            FROM product_prices
            WHERE product_id = p_product_id
              AND valid_ts @> p_effective_from
              AND tx_ts @> 'infinity'::timestamptz
            FOR UPDATE
        ),
        terminate_old AS (
            UPDATE product_prices
            SET tx_ts = TSTZRANGE(lower(current_version.tx_ts), NOW(), '[)')
            FROM current_version
            WHERE product_prices.version_id = current_version.version_id
        )
        -- Re-insert the valid portion before the deletion date
        INSERT INTO product_prices(product_id, price, currency, valid_ts, tx_ts)
        SELECT
            product_id, price, currency,
            TSRANGE(lower(valid_ts), p_effective_from, '[)'),
            TSTZRANGE(NOW(), 'infinity', '[)')
        FROM current_version
        WHERE lower(valid_ts) < p_effective_from;
    
        IF NOT FOUND THEN
            RAISE EXCEPTION 'No active price found for product % to delete at effective date %', p_product_id, p_effective_from;
        END IF;
    END;
    $$ LANGUAGE plpgsql;

    4. Correcting Historical Data (The Power of Bi-temporality)

    This is where the model truly shines. Imagine we discover today (2024-02-15) that a price entered on 2023-11-01 was wrong. It should have been $99.99, not $109.99, for the entire month of November 2023.

    This operation terminates the incorrect historical record and inserts a new historical record with the correct data. Both operations happen within a transaction that starts now.

    sql
    CREATE OR REPLACE FUNCTION correct_historical_price(
        p_product_id UUID,
        p_correct_price NUMERIC(10, 2),
        p_valid_range TSRANGE
    ) RETURNS UUID AS $$
    DECLARE
        v_incorrect_version RECORD;
        v_new_version_id UUID;
    BEGIN
        SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    
        -- Find the incorrect historical record(s). This query is tricky because we need to
        -- find what we *believed* to be true just before this transaction started.
        -- For simplicity, we'll find the version(s) that were active and covered this valid range.
        -- A more complex implementation might need to handle multiple corrections.
        SELECT * INTO v_incorrect_version
        FROM product_prices
        WHERE product_id = p_product_id
          AND valid_ts && p_valid_range -- It overlaps with the period we are correcting
          AND tx_ts @> 'infinity'::timestamptz -- It is currently believed to be true
        FOR UPDATE;
    
        IF NOT FOUND THEN
            RAISE EXCEPTION 'No current record found to correct for product % in range %', p_product_id, p_valid_range;
        END IF;
    
        -- 1. Terminate the incorrect record's transaction time.
        UPDATE product_prices
        SET tx_ts = TSTZRANGE(lower(tx_ts), NOW(), '[)')
        WHERE version_id = v_incorrect_version.version_id;
    
        -- Note: This is a simplified correction. A full implementation would need to handle
        -- splitting the original range if the correction only applies to a sub-period.
        -- Here we assume the correction replaces the entire old record's validity.
    
        -- 2. Insert the new, correct record.
        -- Its valid_ts is in the past, but its tx_ts starts now.
        INSERT INTO product_prices(product_id, price, currency, valid_ts, tx_ts)
        VALUES (
            v_incorrect_version.product_id,
            p_correct_price,
            v_incorrect_version.currency,
            p_valid_range,
            TSTZRANGE(NOW(), 'infinity', '[)')
        )
        RETURNING version_id INTO v_new_version_id;
    
        RETURN v_new_version_id;
    END;
    $$ LANGUAGE plpgsql;

    After this operation, a query asking for the price in November 2023 as of today will get the correct $99.99 price. But a query asking for the price as of December 2023 will still get the original, incorrect $109.99 price, providing a perfect audit trail.


    Querying Bi-temporal Data

    Querying is where the investment in this model pays off. The range containment operator @> is your primary tool.

    Let's assume the following state:

  • v1: price $100, valid_ts: [2023-01-01, 2023-06-01), tx_ts: [2023-01-01, 2023-02-15) (Corrected)
  • v2: price $105, valid_ts: [2023-01-01, 2023-06-01), tx_ts: [2023-02-15, infinity) (Correction for v1)
  • v3: price $110, valid_ts: [2023-06-01, infinity), tx_ts: [2023-05-20, infinity) (Future price update)
  • As-At Query: "What was the price on a specific date?"

    This answers from the perspective of the current state of knowledge.

    sql
    -- What is the price on 2023-03-15?
    SELECT product_id, price, currency
    FROM product_prices
    WHERE product_id = '...' 
      AND valid_ts @> '2023-03-15'::timestamptz
      AND tx_ts @> 'infinity'::timestamptz; -- Only currently active records
    
    -- Result: $105 (v2)

    As-Of Query: "What did we think the state of the world was on a specific date?"

    This is a historical audit query.

    sql
    -- What did we believe the price for 2023-03-15 was, back on 2023-02-01?
    SELECT product_id, price, currency
    FROM product_prices
    WHERE product_id = '...'
      AND valid_ts @> '2023-03-15'::timestamptz
      AND tx_ts @> '2023-02-01'::timestamptz; -- As we knew it on this date
    
    -- Result: $100 (v1), because the correction (v2) hadn't happened yet.

    The Combined Bi-temporal Query

    This is the most powerful query, answering both questions simultaneously.

    sql
    -- Show me the price for the product that was valid on 2023-07-01,
    -- as we knew it on 2023-06-15.
    SELECT product_id, price, currency
    FROM product_prices
    WHERE product_id = '...'
      AND valid_ts @> '2023-07-01'::timestamptz -- The 'as-at' part
      AND tx_ts @> '2023-06-15'::timestamptz;   -- The 'as-of' part
    
    -- Result: $110 (v3), because the price update was known on 2023-06-15.

    Performance, Edge Cases, and Production Considerations

    This model is powerful but comes with overhead and complexity that must be managed.

    Indexing Strategy

    The GiST indexes on (product_id, valid_ts) and (product_id, tx_ts) are not optional; they are fundamental to the performance of range queries and the exclusion constraint. Without them, PostgreSQL would have to perform sequential scans for every query and every write, leading to unacceptable performance on any non-trivial dataset.

    Analyze your query patterns with EXPLAIN ANALYZE. If you frequently query only on valid_ts across all products, a standalone GiST index on valid_ts might be beneficial.

    Write Amplification and Table Bloat

    Every logical UPDATE or DELETE results in at least one UPDATE and one INSERT. This write amplification means your table will grow much faster than a traditional table. PostgreSQL's MVCC system will create dead tuples for every UPDATE, leading to table bloat.

    * Aggressive Autovacuum: Tune autovacuum parameters for your bi-temporal tables to be more aggressive. Lower autovacuum_scale_factor and autovacuum_analyze_scale_factor significantly (e.g., to 0.01 or lower) for these specific tables.

    * Partitioning: For very large datasets (billions of rows), partitioning is essential. A common strategy is to partition by product_id (LIST partitioning) or by a component of the transaction time, such as the year tx_ts was created (RANGE partitioning). This keeps indexes smaller and allows for more efficient maintenance.

    Transaction Isolation and Concurrency

    As shown in the PL/pgSQL functions, SERIALIZABLE isolation is the safest level for complex bi-temporal operations. A REPEATABLE READ transaction could suffer from phantom reads where one transaction starts an update, and a second concurrent transaction doesn't see the first one's target row and tries to update it as well, leading to integrity violations when both try to commit. SERIALIZABLE prevents this but can increase transaction aborts under high contention. You must build retry logic into your application when using serializable transactions.

    Clock Skew and Timestamps

    Using NOW() in a distributed application environment can be risky if server clocks are not perfectly synchronized. A better practice within PL/pgSQL functions is to use transaction_timestamp(). This function returns a single, stable timestamp for the duration of the entire transaction, preventing inconsistencies where different operations within the same function get slightly different timestamps.

    Handling Contiguous Validity

    The EXCLUDE constraint prevents overlaps, but it does not prevent gaps in valid_ts. If your business logic requires that a product always has a price (i.e., the valid_ts ranges must be perfectly contiguous), you will need to enforce this with more complex logic, likely in a CONSTRAINT TRIGGER. This trigger would fire at the end of a transaction and check that for a given product_id, the union of all its currently-active valid_ts ranges forms a continuous timeline.

    Conclusion

    Implementing a bi-temporal data model is a significant engineering investment. It introduces complexity in your write paths and requires a deeper understanding of your database's capabilities. However, for systems where auditability and historical accuracy are non-negotiable, it is an unparalleled pattern. By leveraging PostgreSQL's native TSRANGE types, GiST indexes, and partial exclusion constraints, you can build a robust, performant, and database-enforced bi-temporal system that pushes complex consistency logic out of the application layer and into the database, where it belongs.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles