Production Patterns for PostgreSQL Time-Series Partitioning with pg_partman

19 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inevitable Scaling Problem of Time-Series Data

In any system that generates event, log, or metric data, you will eventually face the challenge of a single, monolithic table growing to an unmanageable size. For time-series data, this inflection point arrives sooner than you'd think. A table with billions of rows, common in observability, IoT, or financial tick data platforms, suffers from several compounding issues:

* Index Inefficiency: B-tree indexes, the workhorse of PostgreSQL, become deep and bloated. Traversing them for queries, even for small time ranges, requires significant I/O. INSERT and UPDATE operations also slow down as they must update these massive indexes.

* VACUUM Overhead: The autovacuum daemon struggles to keep up with dead tuples in a constantly churning, multi-terabyte table. A manual VACUUM can take hours or even days, consuming significant I/O and CPU, often impacting production performance.

* Query Performance Degradation: Queries that don't hit an index perfectly must perform large sequential scans. Even with indexes, the query planner may struggle to generate optimal plans for a table with skewed data distribution over time.

* Maintenance Nightmares: Operations like adding a column, altering a data type, or rebuilding an index become high-risk, long-running procedures that require significant downtime windows.

PostgreSQL's native declarative partitioning (introduced in version 10) provides the fundamental building blocks to solve this by splitting a large table into smaller, more manageable child tables (partitions). However, it is not a complete solution. It provides the what (partitioning), but not the how for ongoing maintenance. You are still responsible for creating new partitions for future data and dropping old partitions to manage data retention. This manual overhead is untenable in a production environment.

This is where pg_partman becomes an indispensable tool. It's not just a convenience layer; it's a robust automation and management framework for time and serial-based partitioning. This article assumes you understand the basics of why you need partitioning and focuses on the advanced, production-ready patterns for implementing and managing it with pg_partman.


Core Architecture: Beyond `CREATE PARTITION`

pg_partman operates by managing a parent partitioned table and its children through a metadata-driven approach. Its two central components are the part_config table and the run_maintenance() function.

* part_config table: This is the control panel. Every parent table you want pg_partman to manage gets a row in this table. This row defines the partitioning strategy: the parent table, the partition key (control column), the partition type (time or serial), the interval (e.g., daily, monthly), and advanced settings for retention and pre-creation.

* run_maintenance() function: This is the engine. When executed, it inspects part_config, identifies which partition sets need attention, creates new partitions for upcoming time intervals, and enforces retention policies by detaching or dropping old partitions. This function is designed to be run periodically, typically via a scheduler like pg_cron.

Let's model a realistic scenario: an events table for a high-traffic application, logging user interactions.

sql
-- Our schema for tracking application events
CREATE SCHEMA events_schema;

CREATE TABLE events_schema.events (
    event_id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    event_type TEXT NOT NULL,
    user_id BIGINT,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- We must have an index on our partition key
CREATE INDEX idx_events_created_at ON events_schema.events (created_at);

Without partitioning, this table could grow to 10TB+ in a year. Queries for a specific user's activity in the last week would become progressively slower. Let's fix this.

Production Implementation Walkthrough

1. Initial Setup and Configuration

First, install the extension and convert our existing table into a partitioned table. For a new system, you'd create it as partitioned from the start. Here, we'll assume we're migrating.

sql
-- Step 1: Install pg_partman
CREATE EXTENSION pg_partman;

-- Step 2: Create the parent partitioned table
-- NOTE: In a real migration, you'd rename the old table and create a new
-- one with the original name. We'll cover migration later. For now, let's
-- assume we are starting fresh with a partitioned table.

CREATE TABLE events_schema.events_partitioned (
    event_id BIGINT GENERATED BY DEFAULT AS IDENTITY,
    event_type TEXT NOT NULL,
    user_id BIGINT,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
)
PARTITION BY RANGE (created_at);

-- IMPORTANT: The primary key and indexes must include the partition key.
-- This is a fundamental requirement of PostgreSQL partitioning.
ALTER TABLE events_schema.events_partitioned 
  ADD CONSTRAINT pk_events_partitioned PRIMARY KEY (event_id, created_at);

CREATE INDEX idx_events_partitioned_user_id 
  ON events_schema.events_partitioned (user_id, created_at);

CREATE INDEX idx_events_partitioned_event_type 
  ON events_schema.events_partitioned (event_type, created_at);

Now, we register this table with pg_partman.

sql
-- Step 3: Configure partitioning with pg_partman
SELECT partman.create_parent(
    p_parent_table := 'events_schema.events_partitioned',
    p_control := 'created_at',
    p_type := 'native',
    p_interval := 'daily',
    p_premake := 4  -- Create 4 days of future partitions. Crucial for write availability.
);

Analysis of create_parent Parameters:

* p_parent_table: The table to manage.

* p_control: The column to partition by. Must match the PARTITION BY clause.

* p_type: Set to 'native' to use PostgreSQL's built-in partitioning. The older trigger-based method ('partman') is largely legacy.

* p_interval: 'daily'. The choice of interval is critical. Too granular (e.g., 'hourly') creates too many partitions, stressing the query planner. Too coarse (e.g., 'monthly') results in partitions that are still too large. Daily is a common, balanced choice for high-volume event tables.

* p_premake: This is a critical production setting. It tells pg_partman to maintain a buffer of future partitions. If p_premake is 4 for a daily interval, pg_partman will ensure that partitions for the next 4 days always exist. Without this, if your maintenance job fails to run, writes for a new day could fail with a "no partition for key" error.

2. Automating Maintenance with `pg_cron`

pg_partman doesn't run itself. You need to schedule run_maintenance().

sql
-- Step 4: Schedule the maintenance job
-- Ensure pg_cron is enabled in postgresql.conf: shared_preload_libraries = 'pg_cron'
CREATE EXTENSION pg_cron;

-- Run maintenance every hour. This is frequent enough to catch up if a run fails,
-- but not so frequent as to cause overhead. It's an idempotent operation.
SELECT cron.schedule('pg_partman_maintenance', '0 * * * *', 'CALL partman.run_maintenance_proc()');

-- Note: We use run_maintenance_proc() which is a procedure that can be called
-- and handles its own transaction management. This is the recommended approach.

After running create_parent and the first run_maintenance_proc(), you will see new child tables created:

sql
-- Check the created partitions
-- \d+ events_schema.events_partitioned

                                       Partitioned table "events_schema.events_partitioned"
    Column    |           Type           | Collation | Nullable |                     Default                      | Storage  | Stats target | Description
--------------+--------------------------+-----------+----------+--------------------------------------------------+----------+--------------+-------------
 event_id     | bigint                   |           | not null | generated by default as identity                 | plain    |              |
 event_type   | text                     |           | not null |                                                  | extended |              |
 user_id      | bigint                   |           |          |                                                  | plain    |              |
 payload      | jsonb                    |           |          |                                                  | extended |              |
 created_at   | timestamp with time zone |           | not null | now()                                            | plain    |              |
Partition key: RANGE (created_at)
Indexes:
    "pk_events_partitioned" PRIMARY KEY, btree (event_id, created_at)
    "idx_events_partitioned_event_type" btree (event_type, created_at)
    "idx_events_partitioned_user_id" btree (user_id, created_at)
Partitions: events_partitioned_p2024_07_20 FOR VALUES FROM ('2024-07-20 00:00:00+00') TO ('2024-07-21 00:00:00+00'),
            events_partitioned_p2024_07_21 FOR VALUES FROM ('2024-07-21 00:00:00+00') TO ('2024-07-22 00:00:00+00'),
            events_partitioned_p2024_07_22 FOR VALUES FROM ('2024-07-22 00:00:00+00') TO ('2024-07-23 00:00:00+00'),
            events_partitioned_p2024_07_23 FOR VALUES FROM ('2024-07-23 00:00:00+00') TO ('2024-07-24 00:00:00+00'),
            events_partitioned_p2024_07_24 FOR VALUES FROM ('2024-07-24 00:00:00+00') TO ('2024-07-25 00:00:00+00')

Advanced Pattern 1: Implementing Data Retention Policies

Storing data forever is rarely feasible. pg_partman provides a robust mechanism for automatically dropping old data.

Let's configure our events_partitioned table to retain only 90 days of data.

sql
-- Update the configuration to set retention policy
UPDATE partman.part_config
SET
    retention = '90 days',
    retention_keep_table = false, -- If true, detaches the partition; if false, DROPs it.
    retention_keep_index = false
WHERE parent_table = 'events_schema.events_partitioned';

Retention Strategy Deep Dive:

* retention = '90 days': A simple interval. Partitions whose entire range is older than 90 days from the current time will be targeted for removal.

* retention_keep_table = false: This is the critical choice. false means pg_partman will issue a DROP TABLE command for the old partition. This is fast and reclaims disk space immediately. However, it's a destructive action that takes a brief but strong AccessExclusiveLock. true will instead DETACH the partition, turning it into a regular, standalone table. This is useful if you need to archive the data to cold storage (e.g., S3) before deleting it.

Edge Case: The DROP TABLE Lock

Dropping a partition is fast, but the AccessExclusiveLock it requires will block all other queries on the parent table, including SELECTs. While the lock is held for a very short time, on a system with thousands of queries per second, this can cause a noticeable blip or connection pile-up. If this is unacceptable, a two-phase strategy is better:

  • Set retention_keep_table = true. pg_partman detaches the old partition. This is still a fast DDL operation.
    • Have a separate, asynchronous job that finds these detached tables (they have a predictable naming scheme) and drops them during a low-traffic maintenance window.
    sql
    -- Example of a manual cleanup script for detached tables
    DO $$
    DECLARE
        r RECORD;
    BEGIN
        FOR r IN 
            SELECT tablename FROM pg_tables 
            WHERE schemaname = 'events_schema' AND tablename LIKE 'events_partitioned_p%_detached'
        LOOP
            RAISE NOTICE 'Dropping detached partition: %', r.tablename;
            EXECUTE 'DROP TABLE events_schema.' || quote_ident(r.tablename);
        END LOOP;
    END$$;

    Advanced Pattern 2: Zero-Downtime Migration from a Monolithic Table

    This is one of the most challenging database operations. How do you move billions of rows from a live, monolithic events table to a new events_partitioned table without downtime?

    The strategy involves a careful, multi-step process:

  • Setup: Create the new partitioned table (events_partitioned) and configure it with pg_partman as shown above.
  • Dual Writes: Set up a trigger on the original events table to mirror all new INSERTs, UPDATEs, and DELETEs to events_partitioned. This ensures no new data is lost during the migration.
  • Backfill: Copy the historical data from events to events_partitioned in small, manageable batches to avoid long-running transactions and excessive I/O load.
  • Cutover: Once the backfill is complete and the tables are in sync, rename the tables within a single transaction to atomically switch application traffic to the new partitioned table.
  • Here's a code-level look at the most critical parts.

    The Dual-Write Trigger

    sql
    -- A function to mirror DML operations
    CREATE OR REPLACE FUNCTION events_schema.mirror_writes_to_partitioned()
    RETURNS TRIGGER AS $$
    BEGIN
        IF (TG_OP = 'INSERT') THEN
            INSERT INTO events_schema.events_partitioned VALUES (NEW.*);
        ELSIF (TG_OP = 'UPDATE') THEN
            UPDATE events_schema.events_partitioned 
            SET event_type = NEW.event_type, user_id = NEW.user_id, payload = NEW.payload, created_at = NEW.created_at
            WHERE event_id = NEW.event_id AND created_at = NEW.created_at; -- Must include partition key
        ELSIF (TG_OP = 'DELETE') THEN
            DELETE FROM events_schema.events_partitioned 
            WHERE event_id = OLD.event_id AND created_at = OLD.created_at;
        END IF;
        RETURN NULL; -- The trigger is an AFTER trigger, so the return value is ignored.
    END;
    $$ LANGUAGE plpgsql;
    
    -- Attach the trigger to the original table
    CREATE TRIGGER mirror_writes
    AFTER INSERT OR UPDATE OR DELETE ON events_schema.events
        FOR EACH ROW EXECUTE FUNCTION events_schema.mirror_writes_to_partitioned();

    CRITICAL NOTE: This trigger adds overhead to every write on the original table. It must be tested for performance impact before being applied to a production system.

    The Backfill Script

    Never attempt to copy terabytes of data in a single transaction. It will exhaust memory, fill transaction logs, and cause massive table bloat. A batching approach is required.

    sql
    -- A PL/pgSQL procedure for batched backfilling
    CREATE OR REPLACE PROCEDURE events_schema.backfill_events_to_partitioned(batch_size INT, pause_interval INTERVAL)
    LANGUAGE plpgsql AS $$
    DECLARE
        min_id BIGINT;
        max_id BIGINT;
        current_id BIGINT;
    BEGIN
        SELECT min(event_id), max(event_id) INTO min_id, max_id FROM events_schema.events;
        current_id := min_id;
    
        WHILE current_id <= max_id LOOP
            RAISE NOTICE 'Backfilling batch starting from event_id: %', current_id;
            
            INSERT INTO events_schema.events_partitioned (event_id, event_type, user_id, payload, created_at)
            SELECT event_id, event_type, user_id, payload, created_at
            FROM events_schema.events
            WHERE event_id >= current_id AND event_id < current_id + batch_size
            ON CONFLICT DO NOTHING; -- The trigger might have already inserted recent rows
    
            -- Check transaction status and commit
            COMMIT; 
    
            current_id := current_id + batch_size;
    
            -- Pause to reduce load on the primary
            PERFORM pg_sleep_for(pause_interval);
        END LOOP;
    
        RAISE NOTICE 'Backfill complete.';
    END;
    $$;
    
    -- To run it:
    -- CALL events_schema.backfill_events_to_partitioned(10000, '1 second');

    This procedure copies data in chunks of 10,000 rows and pauses for 1 second between batches, preventing I/O saturation. The ON CONFLICT DO NOTHING is a safeguard against race conditions with the mirror trigger.

    The Final Cutover

    Once the backfill is complete and a final check confirms data consistency, the atomic switch can be performed during a brief maintenance window.

    sql
    BEGIN;
    -- Lock both tables to prevent any writes during the rename
    LOCK TABLE events_schema.events IN ACCESS EXCLUSIVE MODE;
    LOCK TABLE events_schema.events_partitioned IN ACCESS EXCLUSIVE MODE;
    
    -- Rename tables
    ALTER TABLE events_schema.events RENAME TO events_legacy_to_be_dropped;
    ALTER TABLE events_schema.events_partitioned RENAME TO events;
    
    COMMIT;

    Performance Tuning and Query Optimization

    Partitioning is useless if your queries don't take advantage of it. The key mechanism is Partition Pruning, where the query planner intelligently excludes partitions that cannot possibly contain the data requested by the WHERE clause.

    For pruning to work, the WHERE clause must directly constrain the partition key.

    Consider this query:

    sql
    EXPLAIN ANALYZE
    SELECT count(*)
    FROM events_schema.events
    WHERE created_at >= '2024-07-22 00:00:00' AND created_at < '2024-07-23 00:00:00';

    Good Query Plan (with Pruning):

    text
    Finalize Aggregate  (cost=25310.96..25310.97 rows=1 width=8) (actual time=152.361..152.362 rows=1 loops=1)
      ->  Gather  (cost=25310.75..25310.96 rows=2 width=8) (actual time=152.316..152.358 rows=3 loops=1)
            Workers Planned: 2
            Workers Launched: 2
            ->  Partial Aggregate  (cost=24310.75..24310.76 rows=1 width=8) (actual time=146.591..146.592 rows=1 loops=3)
                  ->  Parallel Append  (cost=0.00..23956.91 rows=141533 width=0) (actual time=0.024..142.123 rows=113267 loops=3)
                        ->  Parallel Seq Scan on events_partitioned_p2024_07_22  (cost=0.00..23956.91 rows=141533 width=0) (actual time=0.023..139.544 rows=113267 loops=3)
                              Filter: ((created_at >= '2024-07-22 00:00:00+00'::timestamp with time zone) AND (created_at < '2024-07-23 00:00:00+00'::timestamp with time zone))
    Planning Time: 0.452 ms
    Execution Time: 152.421 ms

    Notice the key line: Parallel Seq Scan on events_partitioned_p2024_07_22. The planner correctly identified that only one partition needs to be scanned.

    Now, a common mistake that breaks pruning:

    sql
    EXPLAIN ANALYZE
    SELECT count(*)
    FROM events_schema.events
    WHERE date_trunc('day', created_at) = '2024-07-22 00:00:00';

    Bad Query Plan (No Pruning):

    text
    Finalize Aggregate  (cost=175234.50..175234.51 rows=1 width=8) (actual time=1204.51..1204.52 rows=1 loops=1)
      ->  Gather  (cost=175234.29..175234.50 rows=2 width=8) (actual time=1204.21..1204.49 rows=3 loops=1)
            ...
            ->  Partial Aggregate  (cost=174234.29..174234.30 rows=1 width=8) (actual time=1199.12..1199.13 rows=1 loops=3)
                  ->  Parallel Append  (cost=0.00..174220.83 rows=5342 width=0) (actual time=0.034..1197.89 rows=4270 loops=3)
                        ->  Parallel Seq Scan on events_partitioned_p2024_07_20 ...
                        ->  Parallel Seq Scan on events_partitioned_p2024_07_21 ...
                        ->  Parallel Seq Scan on events_partitioned_p2024_07_22 ...
                        ->  Parallel Seq Scan on events_partitioned_p2024_07_23 ...
                        ->  Parallel Seq Scan on events_partitioned_p2024_07_24 ...

    The planner is forced to scan all partitions because it cannot prove that the result of date_trunc('day', created_at) for a row in the ..._p2024_07_23 partition won't equal the target value. Wrapping the partition key in a function (unless it's IMMUTABLE) is the most common way to accidentally disable pruning.

    Indexing Strategy

    With partitioned tables, you should almost always use local indexes (indexes created on each partition) rather than a global index on the parent table. pg_partman facilitates this beautifully. Any indexes created on the parent table before it has any partitions are treated as part of a template. When pg_partman creates a new partition, it automatically creates the same indexes on that new child table.

    This is why in our setup, we created indexes on events_schema.events_partitioned after defining it as partitioned but before running create_parent. This ensures every daily partition has its own smaller, more efficient indexes on user_id and event_type.

    This strategy has two major benefits:

  • Smaller Indexes: Each index only contains data for its specific partition (e.g., one day's worth of data), making them much smaller, faster to scan, and easier to maintain.
  • Maintenance Isolation: Rebuilding an index on one partition does not lock the others, allowing for more granular and less disruptive maintenance operations.
  • Monitoring and Operational Health

    Once your partitioning is automated, you must monitor it.

    * Monitor pg_partman's logs: Check the PostgreSQL logs for any errors from run_maintenance_proc(). A failure here could halt the creation of new partitions.

    * Check for sufficient premake: Ensure you always have a buffer of future partitions.

    sql
    -- A query to check your partition status
    SELECT
        parent_table,
        (SELECT min(partition_range) FROM partman.show_partitions(parent_table)) as oldest_partition_start,
        (SELECT max(partition_range) FROM partman.show_partitions(parent_table)) as newest_partition_end
    FROM partman.part_config;

    * Disk Space Monitoring: Monitor disk space growth per partition, not just for the whole table. A sudden spike in one partition can indicate an application bug or abuse.

    Conclusion

    Automating PostgreSQL partitioning with pg_partman is a non-negotiable step for managing time-series data at scale. Moving beyond the basic setup to implement robust retention policies, execute zero-downtime migrations, and fine-tune queries for partition pruning are the hallmarks of a mature, production-grade database architecture. By treating partitioning not as a one-time setup but as a continuous, automated process, you can ensure your PostgreSQL database remains performant and manageable, even as it scales to handle trillions of events.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles