Scaling Time-Series Data: Automating PostgreSQL Partitioning with pg_partman

12 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inevitable Crisis of Monolithic Time-Series Tables

In any system that collects time-series data—be it IoT metrics, application logs, or financial transactions—the primary data table is a ticking time bomb. Initially, a single table like events is simple and performant. But as it grows to hundreds of billions of rows and terabytes in size, performance degradation is not linear; it's exponential.

Senior engineers have all felt this pain:

  • Query Performance Collapse: Queries that once took milliseconds now take minutes, even with indexes. The B-tree depth of indexes on columns like timestamp and device_id becomes excessive, and the planner spends more time traversing the index than fetching data.
  • Maintenance Hell: VACUUM and ANALYZE operations, critical for performance, become monstrously slow and I/O intensive. A full VACUUM can take days and lock the table. REINDEX is a scheduled downtime event.
  • Data Archival/Deletion Inefficiency: Deleting old data with DELETE FROM events WHERE event_ts < 'YYYY-MM-DD' is one of the worst operations you can run on a massive table. It generates immense WAL traffic, causes table and index bloat, and can take days to complete, often timing out or filling the disk with transaction logs.
  • PostgreSQL's native declarative partitioning is the textbook solution. It allows you to split a logically single table into smaller, physically separate tables (partitions). The query planner is smart enough to scan only the relevant partitions—a technique called partition pruning—which dramatically improves query performance. However, native partitioning only provides the framework; it does not provide the management engine. You are still responsible for creating new partitions for future data and detaching or dropping old ones. This manual process is error-prone and doesn't scale operationally.

    This is where pg_partman enters. It's not just a convenience wrapper; it's a sophisticated automation engine for partition management that handles the complex lifecycle of time-series data. This article will focus on production-grade implementation patterns using pg_partman and pg_cron to create a truly self-managing database system.


    Core Architecture: `pg_partman` with `pg_cron`

    pg_partman (Partition Manager) is a PostgreSQL extension that automates the creation and management of table partition sets. Its primary job is to ensure that partitions for upcoming time intervals are always available and that old partitions exceeding a retention policy are automatically dropped.

    pg_cron is a simple cron-based job scheduler that runs inside PostgreSQL as an extension. By combining the two, we can schedule pg_partman's maintenance function to run periodically, creating a closed-loop, automated system.

    Initial Environment Setup (Production-Ready)

    First, ensure both extensions are installed and enabled in your database. This is typically done at the infrastructure level (e.g., via shared_preload_libraries in postgresql.conf and then CREATE EXTENSION).

    sql
    -- Ensure extensions are available. This command must be run by a superuser.
    CREATE EXTENSION IF NOT EXISTS pg_partman;
    CREATE EXTENSION IF NOT EXISTS pg_cron;
    
    -- It's best practice to keep partman's configuration tables in a dedicated schema.
    CREATE SCHEMA IF NOT EXISTS partman;

    Let's define a realistic parent table for an IoT event-sourcing system.

    sql
    CREATE TABLE public.iot_events (
        event_id BIGSERIAL,
        device_id UUID NOT NULL,
        event_ts TIMESTAMPTZ NOT NULL DEFAULT now(),
        payload JSONB,
        PRIMARY KEY (event_id, event_ts) -- Partition key MUST be part of the primary key
    )
    PARTITION BY RANGE (event_ts);

    Critical Note: For range or list partitioning, the partition key (event_ts in this case) must be part of the primary key and any unique constraints. This is a fundamental requirement of PostgreSQL partitioning.

    Configuring the First Partition Set

    Now, we'll register this table with pg_partman using its create_parent function. This function is the heart of the configuration, and understanding its parameters is crucial.

    sql
    -- Register iot_events for partitioning management
    SELECT partman.create_parent(
        p_parent_table := 'public.iot_events', 
        p_control := 'event_ts', 
        p_type := 'native', 
        p_interval := '1 day', 
        p_premake := 10, 
        p_start_partition := (now() - interval '3 days')::text
    );

    Let's dissect these parameters from a senior engineer's perspective:

  • p_parent_table: The table to manage. Straightforward.
  • p_control: The partitioning column. This is event_ts.
  • p_type: Set to 'native'. pg_partman once supported an older trigger-based method, but since PostgreSQL 10+, native partitioning is the only sane choice.
  • p_interval: '1 day'. This defines the data range for each partition. Choosing the right interval is critical. Too small (e.g., '1 hour') leads to thousands of partitions, increasing planner overhead. Too large (e.g., '1 month') diminishes the benefits. A good rule of thumb is to size partitions so the active one fits comfortably in memory.
  • p_premake: 10. This is a crucial performance and reliability parameter. It tells pg_partman to always maintain 10 partitions for future data. If today is Monday and the interval is daily, it will ensure partitions for Tuesday, Wednesday, ..., up to next Wednesday+3 exist. This prevents data insertion failures if the maintenance job fails to run for a few cycles.
  • p_start_partition: We've set it to 3 days in the past. When create_parent is first run, it will retroactively create partitions from this start date up to now() + premake. This is useful for partitioning existing data.
  • After running this, you can inspect the created partitions:

    sql
    SELECT parent_table, partition_schemaname, partition_tablename 
    FROM pg_partitions 
    WHERE parent_table = 'iot_events';

    You'll see a list of tables like iot_events_p2023_10_26, iot_events_p2023_10_27, etc.

    Automating Maintenance with `run_maintenance()` and `pg_cron`

    The magic of pg_partman is encapsulated in the run_maintenance() procedure. This single function does all the heavy lifting:

  • Creates new partitions to satisfy the premake configuration.
    • Checks for and attaches any new child partitions if you've created them manually (less common).
    • Enforces retention policies by detaching or dropping old partitions.

    We need to schedule this to run automatically. This is where pg_cron shines.

    sql
    -- Schedule run_maintenance() to execute every hour.
    SELECT cron.schedule(
        'partman-maintenance',
        '0 * * * *', -- Every hour at the 0th minute
        $$CALL partman.run_maintenance(p_analyze := true, p_jobmon := false)$$
    );

    Why run it hourly for daily partitions?

    This is a key production pattern. Running the job more frequently than the partition interval provides resilience. If the job fails at 2 AM, the 3 AM run will catch up. It ensures the premake buffer is always full. Furthermore, running it frequently means each run_maintenance() call does less work, resulting in shorter transaction times and reduced lock contention.

    p_analyze := true is recommended. It tells partman to run ANALYZE on newly created partitions, ensuring the query planner has up-to-date statistics.

    Setting up Retention Policies

    Storing data forever is rarely feasible. pg_partman makes retention trivial. Let's configure iot_events to keep only 90 days of data.

    sql
    UPDATE partman.part_config 
    SET 
        retention = '90 days',
        retention_keep_table = false,
        retention_keep_index = false
    WHERE parent_table = 'public.iot_events';
  • retention: Defines the window. Any partition whose entire data range is older than now() - interval '90 days' will be dropped.
  • retention_keep_table: false. This instructs pg_partman to DROP the table entirely. This is an instantaneous, non-blocking metadata operation. If set to true, it would only DETACH the partition, leaving it as a standalone table—useful for manual archival before deletion.
  • Now, every time run_maintenance() is called by pg_cron, it will silently and efficiently drop any partitions that have aged out. This is a massive operational win compared to DELETE commands.


    Advanced Scenario 1: Composite Partitioning (Time + ID)

    For multi-tenant systems or applications with a high-cardinality categorical key (like device_id), partitioning by time alone may not be enough. A single day's partition might still be too large. The solution is composite partitioning: partition by time first, then sub-partition each time-based partition by another key.

    Let's evolve our iot_events table. We want to partition by event_ts daily, and then sub-partition each day's data by device_id using a HASH function.

    Step 1: Re-create the Parent Table for Composite Partitioning

    sql
    -- This requires dropping the old table
    DROP TABLE public.iot_events;
    
    CREATE TABLE public.iot_events (
        event_id BIGSERIAL,
        device_id UUID NOT NULL,
        event_ts TIMESTAMPTZ NOT NULL DEFAULT now(),
        payload JSONB,
        PRIMARY KEY (event_id, event_ts, device_id) -- All partition keys must be in the PK
    )
    PARTITION BY RANGE (event_ts);

    Notice the device_id is now part of the primary key.

    Step 2: Configure the Primary (Time) Partition with a Template

    This is the critical step. We tell pg_partman that every time it creates a new time partition (e.g., for a new day), it should use a specific template to define its sub-partitioning scheme.

    sql
    -- First, create the template table. It will not store data.
    -- It defines the structure and sub-partitioning for future time partitions.
    CREATE TABLE public.iot_events_template (
        event_id BIGSERIAL,
        device_id UUID NOT NULL,
        event_ts TIMESTAMPTZ NOT NULL,
        payload JSONB,
        PRIMARY KEY (event_id, event_ts, device_id)
    )
    PARTITION BY HASH (device_id);
    
    -- Create the hash partitions FOR THE TEMPLATE. These will be copied for each new day.
    -- Let's create 4 hash partitions.
    CREATE TABLE public.iot_events_template_0 PARTITION OF public.iot_events_template FOR VALUES WITH (MODULUS 4, REMAINDER 0);
    CREATE TABLE public.iot_events_template_1 PARTITION OF public.iot_events_template FOR VALUES WITH (MODULUS 4, REMAINDER 1);
    CREATE TABLE public.iot_events_template_2 PARTITION OF public.iot_events_template FOR VALUES WITH (MODULUS 4, REMAINDER 2);
    CREATE TABLE public.iot_events_template_3 PARTITION OF public.iot_events_template FOR VALUES WITH (MODULUS 4, REMAINDER 3);
    
    -- Now, register the parent with pg_partman, referencing the template table.
    SELECT partman.create_parent(
        p_parent_table := 'public.iot_events', 
        p_control := 'event_ts', 
        p_type := 'native', 
        p_interval := '1 day', 
        p_premake := 10,
        p_template_table := 'public.iot_events_template'
    );

    Now, whenever run_maintenance() creates a new daily partition, for example iot_events_p2023_10_30, it will not be a regular table. It will be a partitioned table itself, with 4 hash sub-partitions for device_id. The query planner can now prune by both time and device ID, offering massive performance gains for queries like:

    sql
    SELECT * FROM iot_events 
    WHERE event_ts >= '2023-10-29 00:00:00' AND event_ts < '2023-10-30 00:00:00'
    AND device_id = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11';

    The planner will immediately discard all time partitions except iot_events_p2023_10_29, and within that partition, it will only scan one of the four hash sub-partitions.


    Advanced Scenario 2: Edge Cases - Default Partitions and Out-of-Range Data

    What happens if you receive data with a timestamp far in the future, beyond your premake window? Or data with a NULL or very old timestamp? By default, native partitioning will raise an error: ERROR: no partition of "iot_events" found for row. This can cause data loss and application failures.

    pg_partman provides a robust solution using a default partition.

    Step 1: Create and Attach a Default Partition

    First, create the default partition table. It must have the exact same structure as the parent.

    sql
    CREATE TABLE public.iot_events_default PARTITION OF public.iot_events DEFAULT;

    Step 2: Inform pg_partman about it

    This is a configuration flag in part_config.

    sql
    UPDATE partman.part_config 
    SET 
        use_default = true
    WHERE parent_table = 'public.iot_events';

    Now, any data that doesn't fit into an existing partition will be silently routed to iot_events_default instead of causing an error.

    Operationalizing the Default Partition

    The default partition is a safety net, not a destination. It should be monitored closely. If it contains a significant amount of data, it means one of two things:

  • Your premake is too small, and you're inserting data faster than partitions are being created.
    • You are receiving bad data (e.g., with timestamps from the year 1970 or 2099).

    A production system should have monitoring that alerts when the size of the default partition grows beyond a threshold.

    sql
    -- A query for your monitoring system
    SELECT pg_size_pretty(pg_total_relation_size('public.iot_events_default'));

    You should also have a periodic maintenance job (perhaps a weekly script) to inspect the data in the default partition, move any valid data to the correct partitions if they now exist, and log/discard invalid data.


    Performance Deep Dive: The 'Why' Behind Partitioning

    Let's quantify the impact with EXPLAIN ANALYZE.

    Scenario: A 1TB iot_events table (unpartitioned) vs. a partitioned equivalent with 100 daily partitions of 10GB each. We want to query for one hour of data for a specific device.

    Query on Monolithic Table:

    sql
    EXPLAIN ANALYZE SELECT avg((payload->>'temperature')::numeric) 
    FROM iot_events_mono
    WHERE device_id = '...' AND event_ts BETWEEN '2023-10-29 10:00:00' AND '2023-10-29 11:00:00';

    Simulated EXPLAIN Output (Monolithic):

    text
    Bitmap Heap Scan on iot_events_mono (cost=50123.45..1234567.89 rows=5000 width=32) (actual time=15000.123..25000.456 rows=4890 loops=1)
      Recheck Cond: (device_id = '...') AND (event_ts >= '...') AND (event_ts <= '...')
      -> Bitmap Index Scan on idx_events_device_ts (cost=0.00..50100.00 rows=5000 width=0) (actual time=14900.111..14900.111 rows=4890 loops=1)
            Index Cond: (device_id = '...') AND (event_ts >= '...') AND (event_ts <= '...')
    Planning Time: 0.850 ms
    Execution Time: 25010.123 ms

    The planner uses an index, but it has to traverse a massive 1TB index to find the few thousand relevant rows. The execution time is over 25 seconds.

    Query on Partitioned Table:

    sql
    EXPLAIN ANALYZE SELECT avg((payload->>'temperature')::numeric) 
    FROM iot_events
    WHERE device_id = '...' AND event_ts BETWEEN '2023-10-29 10:00:00' AND '2023-10-29 11:00:00';

    Simulated EXPLAIN Output (Partitioned):

    text
    Aggregate (cost=1234.56..1234.57 rows=1 width=32) (actual time=50.123..50.124 rows=1 loops=1)
      -> Append (cost=100.00..1230.00 rows=4890 width=32) (actual time=10.456..45.789 rows=4890 loops=1)
        -> Index Scan using iot_events_p2023_10_29_device_id_ts_idx on iot_events_p2023_10_29 (cost=100.00..1230.00 rows=4890 width=32) (actual time=10.456..45.789 rows=4890 loops=1)
              Index Cond: (device_id = '...') AND (event_ts >= '...') AND (event_ts <= '...')
    Planning Time: 1.250 ms
    Execution Time: 50.456 ms

    Key observations:

  • Partition Pruning: The planner immediately identified that only the iot_events_p2023_10_29 partition could possibly contain the requested data. It doesn't even consider the other 99 partitions.
  • Smaller Index: The index scan is on the much smaller index of the 10GB partition, not the global 1TB index. It's faster and more efficient.
  • Execution Time: The execution time dropped from 25 seconds to 50 milliseconds—a 500x improvement.
  • This is the profound impact of partitioning. The benefits extend to maintenance as well:

  • VACUUM: Vacuuming a 10GB daily partition is trivial. Vacuuming the 1TB monolithic table is a major operation.
  • DROP PARTITION vs. DELETE: Dropping an old 10GB partition is an instantaneous metadata lock. Deleting 10GB of data from the 1TB table could take hours, generate massive WAL, and cause significant bloat.
  • Conclusion: From Reactive to Proactive Database Management

    Implementing an automated partitioning strategy with pg_partman and pg_cron transforms time-series data management from a reactive, problematic chore into a proactive, self-managing system. By correctly configuring parent tables, setting up automated maintenance, and planning for edge cases like composite partitioning and default partitions, you can build a PostgreSQL backend that scales gracefully into the terabytes and beyond.

    The initial setup requires careful thought about partition intervals and keys, but the long-term operational payoff is immense. Your queries remain fast, your maintenance windows disappear, and your database becomes a stable, predictable component of your architecture, rather than a constant source of performance firefighting.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles