PostgreSQL Bi-Temporal Modeling for Immutable Audit Trails
Beyond Versioning: The Case for Bi-Temporal Data
In systems requiring stringent auditing and historical analysis, a simple updated_at column or even a separate history table (a pattern often known as Slowly Changing Dimensions Type 2) is insufficient. These uni-temporal approaches only capture the state of the database at a specific point in time, answering the question: "What did our database record on a given date?"
They fail to answer a more critical, nuanced question: "On March 15th, what did our system believe the state of the world was on January 1st?" This distinction is the core of bi-temporal modeling. We track two axes of time:
valid_time): The time period during which a fact is true in the real world. This is business time. For example, an employee's salary is $80,000 from 2023-01-01 to 2023-12-31.transaction_time): The time period during which a fact is stored as the current state in the database. This is system time, recording when the database learned about a fact. It is always append-only.By modeling both, we create an immutable ledger. We never perform UPDATE or DELETE operations in the traditional sense. Instead, we close out the transaction time of the old record and insert a new one. This allows us to perfectly reconstruct the state of our data not just as it was, but as we knew it to be at any point in history, including corrections of past mistakes.
This article details a production-grade implementation of a bi-temporal model in PostgreSQL, leveraging its powerful, and often underutilized, features.
The Core Schema: A Financial Services Example
Let's model a salaries table for a financial institution, where tracking salary history, including back-dated corrections, is a legal requirement.
Our schema must enforce several invariants:
- For a given employee, their valid time ranges cannot overlap. An employee cannot have two different salaries at the same time.
UPDATE and DELETE DML statements must be intercepted.- Queries must be efficient for both time axes.
Here is the table definition using PostgreSQL's native range type tstzrange and an EXCLUDE constraint.
-- Enable necessary extensions
CREATE EXTENSION IF NOT EXISTS btree_gist;
-- Main table for employee salaries
CREATE TABLE salaries (
-- A surrogate primary key for unique row identification
id BIGSERIAL PRIMARY KEY,
-- The business key, identifying the employee
employee_id INT NOT NULL,
-- The salary amount for the given period
salary_amount NUMERIC(10, 2) NOT NULL,
-- VALID TIME: When this salary was effective in the real world.
-- [start, end) - inclusive start, exclusive end.
valid_time TSTZRANGE NOT NULL,
-- TRANSACTION TIME: When this record was considered current in the database.
-- An unbounded upper value signifies it's the current record.
transaction_time TSTZRANGE NOT NULL DEFAULT TSTZRANGE(NOW(), NULL, '[)'),
-- This is the magic. It prevents overlapping valid_time ranges for the same employee.
-- It only applies to rows that are currently active (transaction_time is current).
-- We use a partial constraint for this.
CONSTRAINT salaries_valid_time_no_overlap
EXCLUDE USING GIST (employee_id WITH =, valid_time WITH &&)
WHERE (UPPER(transaction_time) IS NULL)
);
-- Indexing for performance is critical
-- GiST index is required for the EXCLUDE constraint and is optimal for range queries.
CREATE INDEX salaries_employee_id_valid_time_gist_idx ON salaries USING GIST (employee_id, valid_time);
-- A standard B-tree index on transaction_time is also useful for "as of" queries.
CREATE INDEX salaries_transaction_time_idx ON salaries (transaction_time);
-- Index on employee_id for quick lookups of an employee's entire history.
CREATE INDEX salaries_employee_id_idx ON salaries (employee_id);
Deconstruction of the Schema
tstzrange: We use timestamp with time zone range for both time axes. The [) notation means the lower bound is inclusive and the upper bound is exclusive, which is ideal for representing contiguous time periods without gaps or overlaps.transaction_time Default: TSTZRANGE(NOW(), NULL, '[)') sets the transaction start time to the current moment and leaves the end time NULL (unbounded), signifying this is the current, active record.EXCLUDE USING GIST: This is a powerful PostgreSQL constraint that goes beyond UNIQUE. It ensures that for any two rows in the table, the condition (employee_id_1 = employee_id_2) AND (valid_time_1 && valid_time_2) is false. The && operator means "overlaps".WHERE (UPPER(transaction_time) IS NULL) clause is critical. It applies the overlap constraint only to currently active records. We must allow historical records (those with a closed transaction_time) to have overlapping valid_time ranges, as they represent corrected or superseded information.Intercepting Changes: The Immutability Trigger
To enforce immutability, we will create a PL/pgSQL trigger function that runs BEFORE UPDATE OR DELETE on the salaries table. This function will prevent these operations and instead perform the correct bi-temporal mutations.
CREATE OR REPLACE FUNCTION bi_temporal_salaries_trigger()
RETURNS TRIGGER AS $$
DECLARE
current_tx_time TIMESTAMPTZ := NOW();
BEGIN
-- Intercept DELETE operations
IF (TG_OP = 'DELETE') THEN
-- Instead of deleting, we "close" the transaction time of the current record.
-- This marks it as no longer the current version in the database.
UPDATE salaries
SET transaction_time = TSTZRANGE(LOWER(transaction_time), current_tx_time, '[)')
WHERE id = OLD.id;
-- Prevent the actual DELETE from happening
RETURN NULL;
END IF;
-- Intercept UPDATE operations
IF (TG_OP = 'UPDATE') THEN
-- First, check if the business key or valid_time is being changed.
-- Such changes might require more complex business logic (e.g., a full retraction and new entry).
-- For this example, we'll allow updates to salary_amount only.
IF OLD.employee_id <> NEW.employee_id OR OLD.valid_time <> NEW.valid_time THEN
RAISE EXCEPTION 'Updating business key (employee_id) or valid_time is not allowed. Please create a new record.';
END IF;
-- If only the payload (salary_amount) is changing, proceed.
-- Step 1: Terminate the old record's transaction time.
UPDATE salaries
SET transaction_time = TSTZRANGE(LOWER(transaction_time), current_tx_time, '[)')
WHERE id = OLD.id;
-- Step 2: Insert a new record with the updated data.
-- The valid_time remains the same as the original record,
-- but the transaction_time starts now.
INSERT INTO salaries (employee_id, salary_amount, valid_time, transaction_time)
VALUES (NEW.employee_id, NEW.salary_amount, NEW.valid_time, TSTZRANGE(current_tx_time, NULL, '[)'));
-- Prevent the actual UPDATE from happening
RETURN NULL;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Attach the trigger to the table
CREATE TRIGGER salaries_bi_temporal_management
BEFORE UPDATE OR DELETE ON salaries
FOR EACH ROW EXECUTE FUNCTION bi_temporal_salaries_trigger();
Trigger Logic Explained
DELETE: We don't delete the row. We find the row being targeted (WHERE id = OLD.id) and update its transaction_time range, setting the upper bound to now(). The row now represents a historical fact that is no longer current in the database's view. We then RETURN NULL to cancel the original DELETE command.UPDATE: The logic is a two-step process within a single transaction (guaranteed by the trigger context): 1. We terminate the OLD record's transaction_time just like in the DELETE case.
2. We INSERT a completely new row containing the NEW data. This new row inherits the valid_time from the record it's replacing but gets a brand new transaction_time starting from now().
3. Again, RETURN NULL cancels the original UPDATE.
This ensures a perfect, unbroken audit trail. Every state the data has ever been in is preserved.
Real-World Scenarios and Query Patterns
Let's populate our table and demonstrate the power of this model.
-- Scenario 1: Initial salary for Employee 101
-- Recorded on Jan 1, 2023. Valid from Jan 1, 2023 onwards.
INSERT INTO salaries (employee_id, salary_amount, valid_time)
VALUES (101, 80000.00, TSTZRANGE('2023-01-01', NULL, '[)'));
-- Let's simulate time passing. We'll use a transaction to set a specific time.
BEGIN;
SET LOCAL transaction_timestamp = '2023-06-01 10:00:00 UTC';
-- Scenario 2: A scheduled raise for Employee 101
-- The old salary record needs to be closed, and a new one created.
-- This requires a transaction to do it correctly.
-- First, close the current salary period.
UPDATE salaries
SET valid_time = TSTZRANGE(LOWER(valid_time), '2023-07-01', '[)')
WHERE employee_id = 101 AND UPPER(valid_time) IS NULL AND UPPER(transaction_time) IS NULL;
-- Then, insert the new salary period.
INSERT INTO salaries (employee_id, salary_amount, valid_time)
VALUES (101, 85000.00, TSTZRANGE('2023-07-01', NULL, '[)'));
COMMIT;
-- Let's check the current state
SELECT id, employee_id, salary_amount, valid_time, transaction_time FROM salaries WHERE employee_id = 101;
At this point, you will see three records for employee 101. The first INSERT created one record. The UPDATE (which was intercepted) closed the transaction time on that first record and created a new one with the updated valid_time. Then the final INSERT created the future salary record. This is a subtle complexity: a change in valid time is not a simple UPDATE but a business process of ending one period and starting another.
Now, let's introduce a correction—the most compelling use case for bi-temporal models.
-- Scenario 3: Back-dated correction
-- On Aug 15, 2023, HR realizes the initial salary was wrong. It should have been $82,000.
BEGIN;
SET LOCAL transaction_timestamp = '2023-08-15 14:30:00 UTC';
-- We must "update" the record that was valid from Jan 1 to July 1.
-- Our trigger handles this seamlessly.
UPDATE salaries
SET salary_amount = 82000.00
WHERE employee_id = 101
AND valid_time = TSTZRANGE('2023-01-01', '2023-07-01', '[)')
AND UPPER(transaction_time) IS NULL; -- Target the currently active record for that period
COMMIT;
Now the table contains the complete history, including the correction. Let's query it.
Query Pattern 1: Current State of the World
This is the simplest query: "What are the current, effective salaries for all employees?"
SELECT employee_id, salary_amount, valid_time
FROM salaries
WHERE UPPER(transaction_time) IS NULL -- The record is currently active in the DB
AND valid_time @> NOW(); -- The validity period includes this moment
Query Pattern 2: Point-in-Time Query ("As Of")
"What did the system believe were the salaries on July 10th, 2023?"
This query ignores the correction made in August because it pins the transaction_time to before the correction occurred.
SELECT employee_id, salary_amount, valid_time
FROM salaries
WHERE transaction_time @> '2023-07-10 00:00:00 UTC'::timestamptz;
@> is the range containment operator. We're asking for all records whose transaction_time range contained the timestamp '2023-07-10'. At that time, the system still believed the salary was $80,000.
Query Pattern 3: Historical State Reconstruction ("As Was")
"What were the actual, real-world effective salaries on February 1st, 2023?"
This query looks at the most up-to-date information (transaction_time is current) for a specific point in valid_time.
SELECT employee_id, salary_amount
FROM salaries
WHERE UPPER(transaction_time) IS NULL
AND valid_time @> '2023-02-01 00:00:00 UTC'::timestamptz;
This will correctly return $82,000, reflecting the correction made in August.
Query Pattern 4: The Full Bi-Temporal Query
This is the most powerful query, combining the two previous patterns.
"On July 10th, 2023, what did we think the salary was for employee 101 on February 1st, 2023?"
SELECT employee_id, salary_amount
FROM salaries
WHERE employee_id = 101
AND transaction_time @> '2023-07-10 00:00:00 UTC'::timestamptz -- As the system knew it on this date
AND valid_time @> '2023-02-01 00:00:00 UTC'::timestamptz; -- For this effective date
This query will return $80,000. It winds back the clock on the database's knowledge (transaction_time) and then asks a question about a specific point in real-world time (valid_time). This capability is impossible to achieve reliably with simpler versioning schemes.
Performance, Indexing, and Edge Cases
Performance Considerations
UPDATE and DELETE. Instead of a single in-place update, the system performs an UPDATE (to close the old record) and an INSERT. This is a deliberate trade-off for auditability. For write-heavy systems, this could be a bottleneck. Bulk loading data should bypass the trigger temporarily if possible, with history being constructed carefully after the load.salaries table by employee_id (hash partitioning) or by a dimension of time (e.g., the start of valid_time).Advanced Indexing
As shown in the schema, a GiST index on (employee_id, valid_time) is non-negotiable. It powers both the EXCLUDE constraint and the performance of queries that filter on both employee_id and valid_time ranges. Without it, queries involving valid_time @> 'some_date' would result in full table scans.
For a query like ... WHERE employee_id = ? AND valid_time @> ?, a multi-column GiST index is ideal. PostgreSQL's query planner is smart enough to use this index to first narrow down by employee_id and then efficiently search the time ranges within that subset.
Edge Case: Handling Contiguous Periods
A common challenge is ensuring that when one period ends, the next one begins at the exact same moment, leaving no gaps. The sample UPDATE logic in the application code showed this: SET valid_time = TSTZRANGE(LOWER(valid_time), '2023-07-01', '[)'). The subsequent INSERT must use '2023-07-01' as its start time. This logic must be enforced at the application layer or within more complex stored procedures. The EXCLUDE constraint prevents overlaps but does not enforce contiguity.
Edge Case: Timezone Hell
Using TIMESTAMPTZ and TSTZRANGE is paramount. All timestamps are stored in UTC, and the timezone is handled by the client's connection settings during display. This avoids any ambiguity when dealing with data entered from different geographical locations. Storing local timestamps (TIMESTAMP without time zone) in a bi-temporal system is a recipe for disaster.
Conclusion: The Price and Payoff of Perfect History
Implementing a bi-temporal data model is not a trivial undertaking. It introduces complexity in your write path, requires careful schema design, and demands a new way of thinking about querying data. The cognitive overhead for developers is higher than with traditional CRUD models.
However, for systems where data provenance, auditability, and historical state reconstruction are primary business requirements, the payoff is immense. You gain a database that functions as an immutable ledger, capable of answering complex questions about its own history with absolute certainty. By leveraging advanced features within PostgreSQL, you can build a robust, performant, and logically pure bi-temporal system that provides a level of insight and integrity that simpler models can never achieve.