Advanced Bi-temporal Modeling in Postgres with Range Types & Constraints

20 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

Beyond Simple Versioning: The Case for Bi-temporal Data

In systems engineering, tracking data changes is a solved problem, right? We have audit logs, updated_at timestamps, and perhaps versioning tables populated by triggers. These uni-temporal solutions track the state of data as it was recorded in the database over time. This is sufficient for many applications, but it fails catastrophically when the business logic requires distinguishing between when an event happened in the real world and when it was recorded in our system.

Consider a payroll system. An employee's salary is set to $80,000, effective January 1st. On February 15th, HR discovers a clerical error; the salary should have been $85,000, also effective January 1st. A simple UPDATE statement destroys information. We lose the fact that for 45 days, the system believed the salary was $80,000. This distinction is critical for auditing, dispute resolution, and accurate historical reporting.

This is the domain of bi-temporal modeling. We manage two distinct time axes:

  • Valid Time (valid_time): The time period during which a fact is true in the real world. In our example, the $85,000 salary is valid from January 1st onwards.
  • Transaction Time (transaction_time): The time period during which a fact is stored in the database. The incorrect $80,000 record was part of the database's state from its creation until February 15th. The corrected $85,000 record is part of the state from February 15th onwards.
  • Implementing this pattern from scratch can be a minefield of overlapping intervals, complex application logic, and subtle data integrity bugs. Fortunately, PostgreSQL provides a powerful, often underutilized, set of tools—namely range types and exclusion constraints—that allow us to enforce bi-temporal integrity directly at the database level.

    This article is not an introduction. It's a production-focused guide to building a robust bi-temporal model in PostgreSQL, complete with schema design, advanced data manipulation functions, complex query patterns, and performance considerations.


    Section 1: The Production-Grade Bi-temporal Schema

    Our goal is to create a table that can immutably store the history of employee salaries, respecting both valid and transaction time. We will use tstzrange for our time axes, which represents a range of timestamp with time zone. The lower bound is inclusive, and the upper bound is exclusive, which is a perfect fit for time-based intervals.

    First, let's enable the necessary extension:

    sql
    CREATE EXTENSION IF NOT EXISTS btree_gist;

    Now, the core table structure. We will track salaries for employees.

    sql
    CREATE TABLE employee_salaries (
        -- A surrogate primary key for unique row identification
        id BIGSERIAL PRIMARY KEY,
    
        -- The business key identifying the employee
        employee_id INT NOT NULL,
    
        -- The actual data point we are versioning
        salary NUMERIC(10, 2) NOT NULL CHECK (salary > 0),
    
        -- VALID TIME: When this salary is effective in the real world
        valid_time TSTZRANGE NOT NULL,
    
        -- TRANSACTION TIME: When this record was considered current in the database
        transaction_time TSTZRANGE NOT NULL,
    
        -- This is the magic. It prevents data integrity violations.
        -- For a given employee, you cannot have two currently active records
        -- whose real-world validity periods (valid_time) overlap.
        CONSTRAINT no_overlapping_current_valid_time
        EXCLUDE USING GIST (employee_id WITH =, valid_time WITH &&)
        WHERE (UPPER(transaction_time) IS NULL OR UPPER(transaction_time) = 'infinity')
    );

    Deconstructing the Schema and Constraints

    * id: A standard surrogate primary key. While our business logic will revolve around employee_id and the time ranges, having a simple, immutable row identifier is crucial for foreign key relationships and unambiguous row references.

    * employee_id: The foreign key to an employees table. This is our partitioning key for the temporal data.

    * valid_time: TSTZRANGE representing the real-world effective period. A range of ['2023-01-01', '2024-01-01') means the salary was valid from the start of Jan 1, 2023, up to, but not including, the start of Jan 1, 2024.

    * transaction_time: TSTZRANGE representing the database record's lifespan. When we insert a new record, we'll typically set this to [now(), 'infinity'). When we supersede it, we'll update the upper bound to now().

    * EXCLUDE USING GIST (...): This is the heart of our data integrity model. Let's break it down:

    * EXCLUDE USING GIST: Specifies an exclusion constraint, which is only possible with index types that support the && (overlaps) operator, like GiST.

    * employee_id WITH =: The constraint applies on a per-employee basis. The equality operator = checks that we are comparing rows for the same employee.

    * valid_time WITH &&: This is the core check. It prevents any two constrained rows from having valid_time ranges that overlap (&& is the overlaps operator for range types).

    WHERE (UPPER(transaction_time) IS NULL OR UPPER(transaction_time) = 'infinity'): This is the most critical and advanced part. The constraint only applies to currently active records. A record is considered active if its transaction time has not been closed (its upper bound is infinity). This allows us to have historical records with overlapping valid_time ranges because they represent different versions of history*. Without this WHERE clause, our bi-temporal model would be impossible to implement, as any correction would violate the constraint.

    Indexing Strategy for Performance

    Querying this table will be complex. We need indexes to support our access patterns. The EXCLUDE constraint already creates a GiST index, but we need more.

    sql
    -- The EXCLUDE constraint implicitly creates this index, but let's be explicit
    -- for clarity. This is our primary tool for finding the current state for an employee.
    CREATE INDEX idx_employee_salaries_current_valid
    ON employee_salaries USING GIST (employee_id, valid_time)
    WHERE (UPPER(transaction_time) IS NULL OR UPPER(transaction_time) = 'infinity');
    
    -- An index to quickly find all versions of a fact for a given valid time period.
    -- Crucial for auditing how our knowledge has changed.
    CREATE INDEX idx_employee_salaries_valid_time_history
    ON employee_salaries USING GIST (employee_id, valid_time, transaction_time);
    
    -- A standard B-tree index for looking up all records for an employee, ordered by transaction start.
    CREATE INDEX idx_employee_salaries_employee_id_transaction_start
    ON employee_salaries (employee_id, LOWER(transaction_time));

    This set of indexes provides good coverage for the types of queries we'll explore later.


    Section 2: Advanced Data Manipulation via PL/pgSQL

    Direct INSERT, UPDATE, and DELETE statements on this table are forbidden. All modifications must be managed through stored procedures to correctly maintain the bi-temporal invariants. An UPDATE is not a modification of a row; it is the end of one record's transaction time and the beginning of another's.

    Initial Record Creation

    This is the simplest operation: creating the first salary record for an employee.

    sql
    CREATE OR REPLACE FUNCTION create_initial_salary(
        p_employee_id INT,
        p_salary NUMERIC(10, 2),
        p_effective_from TIMESTAMPTZ
    ) RETURNS BIGINT AS $$
    DECLARE
        new_salary_id BIGINT;
    BEGIN
        INSERT INTO employee_salaries (employee_id, salary, valid_time, transaction_time)
        VALUES (
            p_employee_id,
            p_salary,
            tstzrange(p_effective_from, 'infinity', '[)'),
            tstzrange(NOW(), 'infinity', '[)')
        )
        RETURNING id INTO new_salary_id;
    
        RETURN new_salary_id;
    END;
    $$ LANGUAGE plpgsql;

    Handling Standard Updates (Successor Records)

    This function handles a standard salary change that takes effect from a certain date. It must find the current active record, end its validity, and insert a new one.

    sql
    CREATE OR REPLACE FUNCTION update_current_salary(
        p_employee_id INT,
        p_new_salary NUMERIC(10, 2),
        p_effective_from TIMESTAMPTZ
    ) RETURNS BIGINT AS $$
    DECLARE
        current_record employee_salaries%ROWTYPE;
        new_salary_id BIGINT;
        transaction_now TIMESTAMPTZ := NOW();
    BEGIN
        -- Find and lock the current active record for this employee that is valid now.
        -- The FOR UPDATE clause is critical to prevent race conditions.
        SELECT * INTO current_record
        FROM employee_salaries
        WHERE employee_id = p_employee_id
          AND UPPER(transaction_time) = 'infinity'
          AND valid_time @> p_effective_from
        FOR UPDATE;
    
        IF NOT FOUND THEN
            RAISE EXCEPTION 'No current salary record found for employee_id % to update at effective date %', p_employee_id, p_effective_from;
        END IF;
    
        -- Edge Case: Ensure the new effective date is not in the past relative to the current record's start.
        IF LOWER(current_record.valid_time) >= p_effective_from THEN
            RAISE EXCEPTION 'The new effective date (%) cannot be on or before the start of the current salary period (%). Use historical correction function instead.', p_effective_from, LOWER(current_record.valid_time);
        END IF;
    
        -- 1. End the old record's transaction time and valid time.
        UPDATE employee_salaries
        SET
            transaction_time = tstzrange(LOWER(transaction_time), transaction_now, '[)'),
            valid_time = tstzrange(LOWER(valid_time), p_effective_from, '[)')
        WHERE id = current_record.id;
    
        -- 2. Insert the new salary record, which is valid from the effective date onwards.
        INSERT INTO employee_salaries (employee_id, salary, valid_time, transaction_time)
        VALUES (
            p_employee_id,
            p_new_salary,
            tstzrange(p_effective_from, 'infinity', '[)'),
            tstzrange(transaction_now, 'infinity', '[)')
        )
        RETURNING id INTO new_salary_id;
    
        RETURN new_salary_id;
    END;
    $$ LANGUAGE plpgsql;

    Key Points of update_current_salary:

    * FOR UPDATE: This is non-negotiable. It places a row-level lock on the current salary record, preventing another transaction from attempting to update it simultaneously. Without this, you would have a classic race condition leading to data corruption.

    * transaction_now variable: We capture NOW() once at the beginning of the function. Using NOW() multiple times within the same transaction is fine as it returns the transaction start time, but explicitly capturing it makes the logic clearer and safer, ensuring the end of the old record and the start of the new one are perfectly contiguous in transaction time.

    * Immutability: We never truly UPDATE the salary value. We terminate the old record's time ranges and INSERT a new one.

    The Bi-temporal Powerhouse: Correcting History

    This is the most complex and powerful operation. We need to correct a historical record. For example, a salary that was thought to be $80k from Jan 1st was actually supposed to be $85k.

    sql
    CREATE OR REPLACE FUNCTION correct_salary_history(
        p_employee_id INT,
        p_corrected_salary NUMERIC(10, 2),
        p_correction_valid_from TIMESTAMPTZ,
        p_correction_valid_to TIMESTAMPTZ -- Use NULL for 'infinity'
    ) RETURNS SETOF BIGINT AS $$
    DECLARE
        transaction_now TIMESTAMPTZ := NOW();
        correction_valid_range TSTZRANGE;
        affected_record employee_salaries%ROWTYPE;
        new_id BIGINT;
        return_ids BIGINT[] := ARRAY[]::BIGINT[];
    BEGIN
        correction_valid_range := tstzrange(p_correction_valid_from, p_correction_valid_to, '[)');
    
        -- Find all currently active records that overlap with the correction period.
        -- This loop is necessary because a correction might span multiple historical periods.
        FOR affected_record IN
            SELECT * FROM employee_salaries
            WHERE employee_id = p_employee_id
              AND UPPER(transaction_time) = 'infinity'
              AND valid_time && correction_valid_range
            ORDER BY valid_time
            FOR UPDATE
        LOOP
            -- 1. Terminate the transaction time of the incorrect historical record.
            -- We are saying "As of `transaction_now`, this record is no longer what we believe to be true".
            UPDATE employee_salaries
            SET transaction_time = tstzrange(LOWER(transaction_time), transaction_now, '[)')
            WHERE id = affected_record.id;
    
            -- 2. Re-insert the portions of the old record that are NOT being corrected.
            -- This creates "islands" of truth around the corrected period.
    
            -- Part before the correction
            IF LOWER(affected_record.valid_time) < LOWER(correction_valid_range) THEN
                INSERT INTO employee_salaries (employee_id, salary, valid_time, transaction_time)
                VALUES (affected_record.employee_id, affected_record.salary,
                        tstzrange(LOWER(affected_record.valid_time), LOWER(correction_valid_range), '[)'),
                        tstzrange(transaction_now, 'infinity', '[)'))
                RETURNING id INTO new_id;
                return_ids := array_append(return_ids, new_id);
            END IF;
    
            -- Part after the correction
            IF UPPER(affected_record.valid_time) > UPPER(correction_valid_range) THEN
                INSERT INTO employee_salaries (employee_id, salary, valid_time, transaction_time)
                VALUES (affected_record.employee_id, affected_record.salary,
                        tstzrange(UPPER(correction_valid_range), UPPER(affected_record.valid_time), '[)'),
                        tstzrange(transaction_now, 'infinity', '[)'))
                RETURNING id INTO new_id;
                return_ids := array_append(return_ids, new_id);
            END IF;
        END LOOP;
    
        -- 3. Insert the new, corrected record for the specified validity period.
        INSERT INTO employee_salaries (employee_id, salary, valid_time, transaction_time)
        VALUES (p_employee_id, p_corrected_salary,
                correction_valid_range,
                tstzrange(transaction_now, 'infinity', '[)'))
        RETURNING id INTO new_id;
        return_ids := array_append(return_ids, new_id);
        
        -- Unnest the array of new IDs to return as a set
        RETURN QUERY SELECT unnest(return_ids);
    
    END;
    $$ LANGUAGE plpgsql;

    This function is dense. It finds every currently-true record that overlaps with the correction period. For each one, it:

    • Terminates it in transaction time (retires it).
  • Re-creates the parts of that record's validity period that fall outside the correction window. This is crucial for preserving history accurately.
    • Finally, after processing all affected historical records, it inserts the single new corrected record.

    This preserves the full audit trail. We can see the old incorrect records, when they were retired, and the new corrected records that replaced them.


    Section 3: The Four Essential Bi-temporal Queries

    Now that we have a robust way to store the data, how do we retrieve it? Querying bi-temporal data requires a shift in thinking. We are always querying based on one or two time dimensions.

    Let's populate some data to make our queries concrete.

    sql
    -- Scenario Setup
    -- Employee 101 starts on Jan 1, 2023 with $80k salary
    SELECT create_initial_salary(101, 80000, '2023-01-01');
    
    -- On July 1, 2023, they get a raise to $90k
    -- We will need to set the transaction time manually for this example to be reproducible
    BEGIN;
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    SELECT update_current_salary(101, 90000, '2023-07-01');
    -- Let's pretend this transaction happened on 2023-06-20
    UPDATE employee_salaries SET transaction_time = tstzrange('2023-06-20', 'infinity', '[)') WHERE salary = 90000;
    UPDATE employee_salaries SET transaction_time = tstzrange(lower(transaction_time), '2023-06-20', '[)') WHERE salary = 80000 AND employee_id = 101;
    COMMIT;
    
    -- On Feb 15, 2024, we discover the initial salary was wrong. It should have been $85k from Jan 1 to July 1.
    BEGIN;
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    SELECT correct_salary_history(101, 85000, '2023-01-01', '2023-07-01');
    -- Let's pretend this transaction happened on 2024-02-15
    UPDATE employee_salaries SET transaction_time = tstzrange('2024-02-15', 'infinity', '[)') WHERE salary = 85000;
    UPDATE employee_salaries SET transaction_time = tstzrange(lower(transaction_time), '2024-02-15', '[)') WHERE salary = 80000 AND employee_id = 101;
    COMMIT;

    Query 1: As-Of Query (What is true *now*?)

    This is the most common query. It asks for the state of the world according to the latest information in the database.

    Question: What is the full salary history for employee 101 as we know it today?

    sql
    SELECT
        employee_id,
        salary,
        valid_time
    FROM employee_salaries
    WHERE employee_id = 101
      AND UPPER(transaction_time) = 'infinity'
    ORDER BY valid_time;

    Result:

    employee_idsalaryvalid_time
    10185000["2023-01-01 00:00:00+00", "2023-07-01 00:00:00+00")
    10190000["2023-07-01 00:00:00+00", "infinity")

    This shows the corrected history. The incorrect $80,000 record is not present because its transaction_time has been closed.

    Query 2: As-Was Query (What did we *think* was true?)

    This is the quintessential audit query. It allows us to travel back in time and see the state of the database itself.

    Question: What did we believe was employee 101's salary on January 30, 2024, for the date of March 15, 2023?

    sql
    SELECT
        employee_id,
        salary,
        valid_time,
        transaction_time
    FROM employee_salaries
    WHERE employee_id = 101
      -- The state of the database as it was on Jan 30, 2024
      AND transaction_time @> '2024-01-30'::timestamptz
      -- The real-world fact we are interested in (March 15, 2023)
      AND valid_time @> '2023-03-15'::timestamptz;

    Result:

    employee_idsalaryvalid_timetransaction_time
    10180000["2023-01-01 00:00:00+00", "2023-07-01 00:00:00+00")["2023-01-01 ...", "2024-02-15 00:00:00+00")

    This query correctly shows that on Jan 30, 2024 (before the correction on Feb 15), we believed the salary for that period was $80,000.

    Query 3: Evolution of a Fact

    This query shows how our knowledge about a specific period in time has changed.

    Question: Show me the entire history of what we thought the salary was for March 15, 2023.

    sql
    SELECT
        salary,
        transaction_time
    FROM employee_salaries
    WHERE employee_id = 101
      AND valid_time @> '2023-03-15'::timestamptz
    ORDER BY LOWER(transaction_time);

    Result:

    salarytransaction_time
    80000["2023-01-01 ...", "2024-02-15 00:00:00+00")
    85000["2024-02-15 00:00:00+00", "infinity")

    This powerful audit trail shows that we initially recorded the salary as $80,000, and this belief was held until Feb 15, 2024, at which point it was corrected to $85,000, which is our current belief.

    Query 4: Point-in-Time Reconstruction

    This query reconstructs the salary history for an employee as it was known at a specific transaction time.

    Question: Show me the full salary history for employee 101 as we knew it on Jan 30, 2024.

    sql
    WITH history_at_point_in_time AS (
        SELECT
            id,
            employee_id,
            salary,
            valid_time,
            transaction_time
        FROM employee_salaries
        WHERE employee_id = 101
          AND transaction_time @> '2024-01-30'::timestamptz
    )
    SELECT
        employee_id,
        salary,
        valid_time
    FROM history_at_point_in_time
    ORDER BY valid_time;

    Result:

    employee_idsalaryvalid_time
    10180000["2023-01-01 00:00:00+00", "2023-07-01 00:00:00+00")
    10190000["2023-07-01 00:00:00+00", "infinity")

    This correctly reconstructs the history before the correction was made, showing the $80,000 salary as the active record for the first half of the year.


    Section 4: Performance, Edge Cases, and Production Hardening

    While powerful, this pattern introduces complexity that requires careful management.

    Performance and Partitioning

    For tables with millions or billions of rows, performance will degrade. The GiST indexes are effective, but they are not as fast as B-tree indexes for all query types. The primary strategy for scaling is partitioning.

    PostgreSQL's declarative partitioning is a perfect fit here. You would almost certainly partition the employee_salaries table by employee_id (either HASH or LIST partitioning).

    sql
    -- Example of partitioning by HASH
    CREATE TABLE employee_salaries_partitioned (
        -- ... same columns as before ...
    ) PARTITION BY HASH (employee_id);
    
    CREATE TABLE employee_salaries_p0 PARTITION OF employee_salaries_partitioned FOR VALUES WITH (MODULUS 4, REMAINDER 0);
    CREATE TABLE employee_salaries_p1 PARTITION OF employee_salaries_partitioned FOR VALUES WITH (MODULUS 4, REMAINDER 1);
    -- etc.

    This ensures that any query for a specific employee only ever has to scan one small partition and its corresponding index, dramatically improving performance.

    Timezone Management

    We deliberately used TIMESTAMPTZ. This is non-negotiable. All timestamps are stored in UTC. Your application server, database connection, and any clients must have their timezones configured correctly. All input timestamps should be unambiguous. The AT TIME ZONE construct in PostgreSQL becomes your best friend for converting user-input local times into correct TIMESTAMPTZ values for querying.

    Edge Case: Contiguous Periods

    The update_current_salary function assumes a simple succession where one period ends and the next begins. What if there is a gap? For example, an employee leaves and is rehired later. The logic must be adapted to handle this. The update function would fail because it wouldn't find a record where valid_time @> p_effective_from. This is correct behavior; you would need a separate rehire_employee function that simply inserts a new record without trying to supersede a previous one.

    Edge Case: Correction at the Boundary

    What happens if a correction is made exactly at the boundary of two existing valid time periods? The logic in correct_salary_history is designed to handle this by splitting and re-inserting the non-affected parts. However, this requires rigorous testing. Off-by-one errors with range boundaries are a common source of bugs. The use of inclusive-lower [ and exclusive-upper ) is the standard, robust way to model this and avoid such errors.

    Conclusion: A Powerful but Sharp Tool

    Bi-temporal data modeling in PostgreSQL using range types and exclusion constraints is a formidable pattern for building systems with high-integrity, auditable data. It moves complex temporal consistency logic from the application layer—where it is difficult to enforce and prone to race conditions—into the database, which is designed to manage data integrity and concurrency.

    This approach is not for every problem. It introduces complexity in data manipulation and requires a new way of thinking about querying. However, for financial systems, insurance, healthcare records, or any domain where the distinction between real-world truth and recorded information is paramount, it provides a level of correctness and auditability that is nearly impossible to achieve with traditional versioning schemes. By mastering these advanced PostgreSQL features, senior engineers can build truly robust and trustworthy systems.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles