Scaling Time-Series Ingestion with Advanced pg_partman Strategies

13 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inevitable Scaling Wall of Time-Series Data

Any system handling event logging, IoT metrics, or financial transactions will eventually face the same PostgreSQL challenge: a single, monolithic table growing to terabytes in size. Standard B-Tree indexes become bloated and slow, VACUUM processes take hours or days, and simple queries cause I/O storms that bring the system to its knees. The textbook solution is table partitioning, and for PostgreSQL, the de facto tool for managing this is pg_partman.

However, a basic pg_partman setup is just the beginning. Implementing it naively in a high-throughput production environment can introduce a new class of subtle, yet critical, problems. Write stalls during partition creation, inefficient query plans that fail to prune partitions, and maintenance jobs that lock critical resources are common outcomes.

This article is for engineers who have already hit this wall. We will bypass the fundamentals and dive directly into the advanced strategies required to manage massive, time-based partitioned tables effectively. We will focus on PostgreSQL 12+ native partitioning, as it offers significant performance benefits over the legacy trigger-based approach.

Our focus will be on a canonical example: an events table in a multi-tenant analytics platform, partitioned daily by its created_at timestamp.

sql
-- The monolithic table we're migrating from
CREATE TABLE events_legacy (
    event_id BIGSERIAL PRIMARY KEY,
    tenant_id INT NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- The target partitioned structure
CREATE TABLE events (
    event_id BIGINT NOT NULL,
    tenant_id INT NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

-- NOTE: No primary key on the parent. This is a critical point we'll address.
CREATE INDEX ON events (tenant_id, created_at);

Section 1: Mastering `pg_partman` Configuration for Production

A default pg_partman installation is insufficient for high-concurrency workloads. The key is in the create_parent function's parameters, which dictate how partitions are created, managed, and indexed.

The Criticality of `p_premake`

The single most important parameter for preventing write stalls is p_premake. By default, pg_partman creates the next required partition only when data for that period arrives. In a high-ingestion system, the maintenance job might not have run yet, forcing the first write of a new day (or hour) to trigger partition creation synchronously. This DDL operation requires a high-level lock (ACCESS EXCLUSIVE) on the parent table, blocking all incoming writes until it completes.

The Solution: Set p_premake to a value that provides a buffer of empty partitions. A value of 4 for a daily partitioned table ensures that there are always at least four days' worth of future partitions ready to accept data. The maintenance process will then top this up asynchronously, making partition creation invisible to the application's write path.

The Template Table Pattern

Managing indexes, constraints, and policies across hundreds or thousands of partitions is a maintenance nightmare. The p_template_table parameter is the correct pattern to solve this. pg_partman will use this template to define the structure of all newly created child partitions.

This pattern solves several problems:

  • Consistent Indexing: All partitions will have the exact same indexes.
  • Constraint Propagation: CHECK constraints are applied uniformly.
  • Simplified Alterations: To add an index to all future partitions, you only need to alter the template table.
  • Implementation Example:

    First, we create our parent table and a detached template table that mirrors its structure exactly. We define all our indexes and constraints on this template table.

    sql
    -- Parent table (as before)
    CREATE TABLE events (
        event_id BIGINT NOT NULL,
        tenant_id INT NOT NULL,
        event_type VARCHAR(50) NOT NULL,
        payload JSONB,
        created_at TIMESTAMPTZ NOT NULL
    ) PARTITION BY RANGE (created_at);
    
    -- Template table for all future partitions
    CREATE TABLE events_template (
        event_id BIGINT NOT NULL,
        tenant_id INT NOT NULL,
        event_type VARCHAR(50) NOT NULL,
        payload JSONB,
        created_at TIMESTAMPTZ NOT NULL
    );
    
    -- Define indexes on the template. These will be cloned for each new partition.
    -- This index supports lookups by tenant within a time range.
    CREATE INDEX ON events_template (tenant_id, created_at DESC);
    
    -- Index for specific event types.
    CREATE INDEX ON events_template (event_type);
    
    -- Example GIN index for querying the payload.
    CREATE INDEX ON events_template USING GIN (payload);
    
    -- IMPORTANT: We can define a primary key on partitions, but not the parent.
    -- The partition key (`created_at`) MUST be part of the primary key.
    ALTER TABLE events_template ADD PRIMARY KEY (event_id, created_at);

    Now, we call create_parent with our advanced configuration.

    sql
    -- Install the pg_partman extension
    CREATE EXTENSION pg_partman;
    
    -- Configure pg_partman to manage our 'events' table
    SELECT partman.create_parent(
        p_parent_table := 'public.events',
        p_control := 'created_at',
        p_type := 'native',
        p_interval := '1 day',
        p_premake := 4,  -- Keep 4 days of future partitions ready
        p_start_partition := (now() - interval '3 days')::text, -- Optionally pre-create some past partitions
        p_template_table := 'public.events_template'
    );

    After this, pg_partman takes over. The events_template table will never hold data; it's purely a structural blueprint.


    Section 2: Advanced Indexing Beyond B-Trees

    With partitioning, your indexing strategy must evolve. While B-Tree indexes are excellent for OLTP-style lookups, time-series data often involves analytical queries over large ranges, where B-Trees can be inefficient.

    Local vs. Global Indexes and Partition Pruning

    PostgreSQL native partitioning only supports local indexes (indexes created on each partition). There is no concept of a global index spanning all partitions. This is a feature, not a bug, as it makes partition management (attaching/detaching) a near-instantaneous metadata operation.

    The query planner relies on Partition Pruning to deliver performance. It inspects the WHERE clause of a query and, if the partition key (created_at in our case) is constrained, it will only scan the relevant child partitions.

    Example of Successful Pruning:

    sql
    EXPLAIN ANALYZE
    SELECT * FROM events
    WHERE tenant_id = 123
      AND created_at >= '2023-10-26 00:00:00'
      AND created_at < '2023-10-27 00:00:00';

    The EXPLAIN plan will show that it only scans a single partition, events_p2023_10_26.

    Example of Pruning Failure:

    sql
    EXPLAIN ANALYZE
    SELECT * FROM events
    WHERE tenant_id = 123
      AND date_trunc('day', created_at) = '2023-10-26';

    Wrapping the partition key in a function like date_trunc can prevent the planner from being able to identify the target partitions, leading to a scan of all partitions. The fix is to always use direct range comparisons on the key column.

    BRIN Indexes for Large-Scale Analytics

    For analytical queries that scan large portions of a table (e.g., "calculate the average number of events per tenant for the last month"), B-Tree indexes are suboptimal. They store a pointer to every single row, resulting in large indexes that can be slow to scan.

    BRIN (Block Range INdex) indexes are a perfect fit for time-series data. created_at timestamps are naturally correlated with their physical storage location on disk. A BRIN index stores only the minimum and maximum value for a large range of table blocks (e.g., 128 blocks).

    * Size: A BRIN index can be 100-1000x smaller than a B-Tree index.

    * Performance: For range scans, the query planner can use the BRIN index to quickly eliminate huge numbers of blocks from consideration, drastically reducing I/O.

    Implementation: Add a BRIN index to your template table.

    sql
    -- Add to events_template so all future partitions get it.
    CREATE INDEX events_created_at_brin_idx ON events_template USING BRIN (created_at);

    Benchmark Scenario:

    Consider a query to count all events in the last 30 days.

    sql
    -- Querying a 1TB table with 365 daily partitions.
    SELECT count(*) FROM events WHERE created_at > now() - interval '30 days';

    * With a B-Tree on created_at: The planner would likely perform 30 parallel index scans, one for each partition, and then bitmap heap scans. This involves significant random I/O to fetch index pages.

    * With a BRIN on created_at: The planner uses the BRIN index to read the metadata for each of the 30 partitions, immediately knows that all blocks in those partitions match the filter, and proceeds to a sequential scan of only those tables. This is far more efficient for large range scans.

    Edge Case: BRIN indexes are ineffective if the data is not well-correlated. If you were to backfill old data, it might be physically stored alongside new data, breaking the correlation. Running VACUUM FREEZE on the affected partitions can help reorganize blocks and improve BRIN effectiveness.


    Section 3: Production Operations and Zero-Downtime Maintenance

    Automating partition creation and deletion is pg_partman's primary function, but production systems require more nuance than simply dropping old data.

    The `run_maintenance_proc` Job

    pg_partman's maintenance is driven by the run_maintenance_proc() procedure. This should be scheduled to run periodically. Using an extension like pg_cron is the most robust way to manage this inside the database.

    sql
    -- Schedule maintenance to run every hour.
    SELECT cron.schedule('partman-maintenance', '0 * * * *', 'CALL partman.run_maintenance_proc()');

    This single procedure call will:

  • Create new partitions if the p_premake buffer is running low.
    • Drop or detach old partitions based on the retention policy.
    • Check for and re-attach any previously detached partitions if needed.

    Retention Policy: Never Drop, Always Detach and Archive

    Setting a retention policy in pg_partman is done via the partman.part_config table.

    sql
    UPDATE partman.part_config
    SET retention = '90 days',
        retention_keep_table = false -- This means DROP the table
    WHERE parent_table = 'public.events';

    This is dangerous. Dropping a partition is an irreversible action. A bug in your configuration or a sudden change in compliance requirements could lead to permanent data loss.

    A far safer production pattern is to detach old partitions and archive them to cold storage like Amazon S3.

  • Configure pg_partman to detach instead of drop:
  • sql
        UPDATE partman.part_config
        SET retention = '90 days',
            retention_keep_table = true, -- Detaches the partition, leaves it as a regular table
            retention_schema = 'archive' -- Moves the detached table to a different schema
        WHERE parent_table = 'public.events';
  • Implement an Archival Workflow: After run_maintenance_proc() runs, the archive schema will contain tables like events_p2023_07_01. An external script or a PostgreSQL background worker can then:
  • * Use COPY ... TO STDOUT to stream the table's contents to a compressed file (e.g., gzip).

    * Upload the compressed file to S3.

    * Verify the upload was successful.

    * Finally, DROP the detached table from the archive schema.

    This pattern provides an air gap, preventing accidental data loss and creating a path for data restoration or offline analysis.

    Edge Case: Handling Out-of-Order and Late-Arriving Data

    What happens if an event from three days ago arrives now due to a network partition or client-side buffering? Native partitioning will throw an error: ERROR: no partition of relation "events" found for row.

    pg_partman creates a default partition (events_default) to catch this data. However, leaving data in the default partition is an anti-pattern, as it will grow indefinitely and won't be indexed correctly.

    The Production Solution:

  • Monitor the Default Partition: Set up monitoring to alert you if SELECT count(*) on events_default is greater than zero.
  • Create a Re-Insertion Procedure: Develop a stored procedure that:
  • a. Locks the rows in the default partition to prevent race conditions.

    b. Reads the rows and determines the distinct partitions they belong to.

    c. Crucially, it checks if those historical partitions still exist. If the data is older than your retention policy, you must decide whether to discard it or move it to a separate logging table.

    d. Inserts the data into the correct parent table (events), which will route it to the correct child partition.

    e. Deletes the successfully moved rows from the default partition.

    f. Run this procedure periodically or after an alert fires.


    Section 4: Querying and Backfilling at Scale

    Your application's query patterns must be partition-aware. Furthermore, backfilling massive amounts of historical data into a partitioned structure requires a different approach than a simple INSERT loop.

    Multi-Tenant Query Patterns

    In our multi-tenant events table, most queries will be scoped to a tenant_id and a time range. The ideal index on our template table was (tenant_id, created_at DESC).

    When querying, the planner will first use partition pruning on created_at to select a subset of tables. Then, within each of those tables, it will use the local B-Tree index on (tenant_id, created_at) to efficiently find the tenant's data.

    sql
    -- This query is highly efficient.
    EXPLAIN ANALYZE
    SELECT event_type, count(*)
    FROM events
    WHERE tenant_id = 456
      AND created_at >= '2023-10-01 00:00:00'
      AND created_at < '2023-11-01 00:00:00'
    GROUP BY 1;

    The EXPLAIN plan will show it prunes down to the ~31 partitions for October, then performs an efficient Index-Only Scan on each of them.

    The Backfill Challenge

    Imagine you need to import 5TB of historical data spanning three years. A naive INSERT INTO events ... will be painfully slow. The database has to evaluate every single row to determine which partition it belongs to, leading to significant overhead.

    The High-Performance Backfill Pattern:

  • Pre-create All Partitions: Use a script to call partman.create_partition_time() to manually create all the daily partitions for the entire backfill range. Do this in a single transaction if possible.
  • sql
        -- Pseudo-code for pre-creation script
        DO $$
        DECLARE
            v_start_date date := '2020-01-01';
            v_end_date date := '2023-01-01';
            v_current_date date := v_start_date;
        BEGIN
            WHILE v_current_date < v_end_date LOOP
                PERFORM partman.create_partition_time('public.events', ARRAY[v_current_date::timestamptz]);
                v_current_date := v_current_date + interval '1 day';
            END LOOP;
        END;
        $$;
  • Bypass the Parent: Load data directly into the child partitions. This is the most critical optimization. Use the COPY command for maximum throughput. You can parallelize this process with multiple workers, each handling a specific month or year.
  • bash
        # Example of loading data for a single day directly into its partition
        cat 2022_10_26_events.csv | psql -c "COPY events_p2022_10_26 FROM STDIN WITH (FORMAT csv)"
  • Analyze After Load: After loading data into a child partition, immediately run ANALYZE ; so the query planner has accurate statistics.
  • This direct-to-child COPY approach bypasses the partitioning routing logic entirely and is orders of magnitude faster for large data volumes.

    Conclusion

    Effectively scaling PostgreSQL for time-series data with pg_partman is a masterclass in production database engineering. It moves beyond simple table creation into a holistic strategy encompassing proactive configuration (p_premake), intelligent indexing (BRIN), safe operational procedures (detach-and-archive), and partition-aware data loading techniques.

    By treating partitions not just as a storage mechanism but as a fundamental unit of operation for indexing, maintenance, and querying, you can build a PostgreSQL system that handles petabytes of time-series data with the performance and reliability required by mission-critical applications.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles