Production Patterns for PostgreSQL Time-Series Partitioning with pg_partman
The Inevitable Crisis of Unbounded Time-Series Tables
In any system that generates event data—be it IoT metrics, application logs, or financial transactions—the primary time-series table is a time bomb. Initially, INSERTs are fast and SELECTs are snappy. But as the table grows from gigabytes to terabytes, query performance plummets, index bloat becomes unmanageable, and maintenance operations like VACUUM or adding a column can lock up the system for hours. The fundamental problem is that a single, monolithic table is an inefficient data structure for time-ordered data at scale.
PostgreSQL's native declarative partitioning (available since v10) is the architectural solution. It allows a logically single table to be physically stored as multiple smaller, more manageable child tables (partitions). This enables powerful optimizations: query planning can prune entire partitions that don't match a query's WHERE clause, and maintenance can be performed on a per-partition basis.
However, native partitioning is not a complete solution; it's a primitive. The responsibility of creating future partitions and dropping old ones falls on you. A missed cron job can lead to catastrophic insertion failures when data arrives for a non-existent partition. This is where pg_partman enters. It's not just a convenience script; it's a robust framework for automating the entire lifecycle of partitioned tables.
This article assumes you understand the basics of PARTITION BY RANGE. We will focus exclusively on the advanced, production-level patterns required to leverage pg_partman effectively for multi-terabyte datasets.
Section 1: Core Architecture and Strategic Setup
Before writing a single line of SQL, we must make critical design decisions. Your choice of partition key and interval has irreversible performance implications.
Partition Key Selection: Beyond the Obvious Timestamp
For time-series data, the obvious partition key is the event timestamp. While often correct, it's not always sufficient. Consider a multi-tenant SaaS platform where you store events for thousands of customers.
A query for one customer's data over the last week (WHERE customer_id = ? AND event_timestamp BETWEEN ? AND ?) on a table partitioned only by event_timestamp would still have to scan 7 daily partitions. While better than a full table scan, we can do better.
Advanced Pattern: Compound Partitioning for Multi-Tenancy
PostgreSQL allows partitioning by a list of columns. However, pg_partman's automation works best with a single control column. The optimal strategy is often to partition by time (RANGE) and ensure your indexes are designed to handle multi-tenancy efficiently.
The most critical aspect is the index structure. An index on (customer_id, event_timestamp) allows the planner to efficiently seek within each relevant partition.
The Partitioning Interval Trade-off
pg_partman supports intervals like hourly, daily, weekly, and monthly. The choice is a balance:
* Small Intervals (e.g., hourly):
* Pros: Extremely fast partition pruning for short time ranges. Maintenance operations (dropping, detaching, vacuuming) are very quick on small tables. Ideal for high-velocity data where recent queries are most common.
* Cons: High table count. Can put pressure on PostgreSQL's catalog and planner if you have tens of thousands of partitions. psql's \d command becomes slow.
* Large Intervals (e.g., monthly):
* Pros: Fewer tables to manage. Simpler for the planner.
* Cons: Queries for a single day still have to scan a whole month's partition. Dropping a month's worth of data can be a heavier operation.
Production Guideline: Start with daily partitions. It's a sane default. For extremely high-volume systems (billions of rows per day), consider hourly. For lower-volume analytical systems, weekly might suffice. You can change this later, but it requires a full data migration.
Code Example 1: Initial Table and `pg_partman` Configuration
Let's implement a daily partitioning scheme for an iot_events table. We'll include a device_id for our indexing strategy.
-- Ensure the extension is available
CREATE EXTENSION IF NOT EXISTS pg_partman;
-- 1. Create the parent table with native partitioning
CREATE TABLE public.iot_events (
event_id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
device_id UUID NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
payload JSONB
) PARTITION BY RANGE (created_at);
-- 2. Create an index optimized for device-specific time range queries
-- This index will be automatically propagated to all child partitions.
CREATE INDEX idx_iot_events_device_id_created_at
ON public.iot_events (device_id, created_at DESC);
-- 3. Configure pg_partman to manage this table
-- This is the most critical step.
SELECT partman.create_parent(
p_parent_table := 'public.iot_events',
p_control := 'created_at',
p_type := 'native',
p_interval := 'daily',
p_premake := 4, -- Create 4 days of partitions in advance. CRITICAL for preventing insert stalls.
p_start_partition := (NOW() - INTERVAL '3 days')::text -- Optionally pre-create some past partitions
);
Dissecting create_parent parameters:
* p_parent_table: The table to manage.
* p_control: The column to partition by.
* p_type: Always use 'native' for modern PostgreSQL (11+).
* p_interval: Our chosen frequency.
p_premake: This is arguably the most important performance parameter. It tells pg_partman to always maintain a buffer of future partitions. If an INSERT arrives for a timestamp that doesn't have a partition, PostgreSQL must create it within that same transaction*. This involves DDL and takes an exclusive lock, stalling all concurrent inserts. With p_premake = 4, we ensure there are always 4 days' worth of future tables ready, making inserts consistently fast.
After running this, you can inspect the created partitions:
SELECT parent.relname AS parent_table, child.relname AS partition_name
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
WHERE parent.relname = 'iot_events';
You'll see tables like iot_events_p2023_10_26, iot_events_p2023_10_27, etc.
Section 2: Automating Maintenance and Retention
Configuration is useless without automation. pg_partman's maintenance procedure, run_maintenance_proc(), is the engine that drives partition creation and deletion.
Scheduling `run_maintenance_proc()`
You have two primary options for scheduling:
pg_cron (Recommended): The pg_cron extension allows you to schedule SQL commands directly from within PostgreSQL. It's robust, transactional, and keeps your entire database logic self-contained.psql -c "CALL partman.run_maintenance_proc();". This works but introduces an external dependency and potential authentication complexities.Code Example 2: Scheduling with pg_cron
First, ensure pg_cron is enabled in postgresql.conf:
# postgresql.conf
shared_preload_libraries = 'pg_partman_bgw,pg_cron'
cron.database_name = 'your_database_name'
Then, schedule the job from psql:
-- Schedule maintenance to run every hour at the 5-minute mark
-- Running it hourly is a good practice even for daily partitions to ensure premake buffer is always full
SELECT cron.schedule(
'pg_partman_maintenance',
'5 * * * *', -- Run at 5 minutes past every hour
$$CALL partman.run_maintenance_proc()$$
);
This single command ensures that new partitions are created and old ones are dropped without any manual intervention.
Implementing a Data Retention and Archiving Strategy
Simply dropping old data is often unacceptable. Production systems require an archive-then-delete workflow. pg_partman excels at this through its retention policy configuration and the concept of detaching partitions.
* Dropping: DELETEs the data and drops the table. The data is gone forever.
* Detaching: Removes the child partition's inheritance link from the parent. It becomes a standalone table, invisible to queries against the parent but still present in the database, ready for archiving.
Advanced Pattern: The Detach-and-Archive Workflow
pg_partman to detach old partitions instead of dropping them.- Have a separate, asynchronous process that scans for detached tables.
- This process copies the data from the detached table to cold storage (e.g., AWS S3, Google Cloud Storage).
- After a successful copy, the process drops the detached table.
Code Example 3: Configuring a Retention Policy
Let's configure pg_partman to keep 30 days of data online and detach anything older.
-- Update the configuration for our iot_events table
UPDATE partman.part_config
SET
retention = '30 days',
retention_keep_table = true, -- This is the key! It tells pg_partman to DETACH, not DROP.
retention_keep_index = true
WHERE parent_table = 'public.iot_events';
Now, when run_maintenance_proc() runs, any partition whose data is entirely older than 30 days will be detached. You'll have a standalone table (e.g., iot_events_p2023_09_25) that you can process with a tool like pg_dump or a foreign data wrapper like aws_s3.
A simplified archival script might look like this (conceptual bash script):
#!/bin/bash
DB_NAME="your_database_name"
S3_BUCKET="s3://your-company-archive-bucket/iot_events/"
# Find detached tables (they follow a pattern but are no longer in pg_inherits)
DETACHED_TABLES=$(psql -d $DB_NAME -t -c "\
SELECT table_schema || '.' || table_name \
FROM information_schema.tables \
WHERE table_name ~ '^iot_events_p[0-9_]+$' \
AND table_name NOT IN (\
SELECT child.relname FROM pg_inherits \
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid \
JOIN pg_class child ON pg_inherits.inhrelid = child.oid \
WHERE parent.relname = 'iot_events'\
);")
for TABLE in $DETACHED_TABLES; do
echo "Archiving $TABLE..."
# Dump the table to a compressed file
pg_dump -d $DB_NAME -t $TABLE | gzip > /tmp/$TABLE.sql.gz
# Upload to S3
aws s3 cp /tmp/$TABLE.sql.gz $S3_BUCKET
# Verify upload and then drop the table
if [ $? -eq 0 ]; then
echo "Archive successful. Dropping table $TABLE."
psql -d $DB_NAME -c "DROP TABLE $TABLE;"
rm /tmp/$TABLE.sql.gz
else
echo "ERROR: Archive failed for $TABLE. Table not dropped."
fi
done
Section 3: Advanced Indexing with Template Tables
By default, indexes created on the parent table are propagated to all children. But what if you need more sophisticated indexing? For example:
* A BRIN index on a column with high correlation to its physical storage location (like a monotonically increasing session ID).
* A partial index to target a small, frequently queried subset of data (e.g., events with payload->>'type' = 'critical_error').
Adding these indexes to every new partition manually defeats the purpose of automation. The pg_partman solution is the template_table.
You create a separate table with the exact structure and indexes you want for your partitions. Then, you tell pg_partman to use this table as a template for all future child partitions it creates.
Code Example 4: Implementing a Custom Indexing Strategy
Let's create a template table for iot_events that includes a BRIN index on event_id and a partial index for critical alerts.
-- 1. Create the template table. It must have the same column structure.
CREATE TABLE public.iot_events_template (
event_id BIGINT,
device_id UUID,
created_at TIMESTAMPTZ,
payload JSONB
);
-- 2. Add the desired advanced indexes to the template table.
-- BRIN index is great for large, sequentially ordered data.
CREATE INDEX idx_template_event_id_brin ON public.iot_events_template USING BRIN (event_id);
-- Partial index for high-priority alerts, assuming a specific payload structure.
CREATE INDEX idx_template_critical_alerts ON public.iot_events_template ((payload->>'alert_level'))
WHERE payload->>'alert_level' = 'critical';
-- 3. Register this template table with pg_partman
UPDATE partman.part_config
SET
template_table = 'public.iot_events_template'
WHERE parent_table = 'public.iot_events';
That's it. From this point forward, every new daily partition created by run_maintenance_proc() will be created using the structure of iot_events_template, meaning it will have our custom BRIN and partial indexes. This is an incredibly powerful pattern for maintaining complex, optimized indexing schemes with zero manual effort.
Section 4: Edge Cases and Operational Gotchas
Real-world operations are never clean. Here's how to handle the inevitable complexities.
Edge Case 1: Backfilling Terabytes of Historical Data
Imagine you need to migrate an existing monolithic iot_events_legacy table into your new partitioned structure. A simple INSERT ... SELECT will be painfully slow. Each row insertion has to go through the partitioning logic. For billions of rows, this could take days.
Solution: The partition_data_proc Bulk Path
pg_partman provides a stored procedure designed for this exact scenario. It works by creating the necessary partitions and then performing bulk data movement, which is orders of magnitude faster.
Workflow:
iot_events_staging).pg_partman has created all necessary historical partitions. You can use create_partition_time() for this.partition_data_proc() to move the data in batches.-- Assuming iot_events_staging contains billions of rows of historical data
-- Step 1 (Optional, if partitions don't exist): Ensure partitions for the data's time range exist.
-- pg_partman might have only created future/recent partitions.
SELECT partman.create_partition_time('public.iot_events', ARRAY['2022-01-01'::timestamptz, '2022-01-02'::timestamptz, ...]);
-- Step 2: Call the bulk-partitioning procedure.
-- This will move data from the source table in batches of 10,000.
-- It returns the number of rows moved.
CALL partman.partition_data_proc('public.iot_events', p_source_table := 'public.iot_events_staging', p_batch_interval := 10000);
This procedure is idempotent. You can run it repeatedly, and it will pick up where it left off. It's the only sane way to backfill massive datasets into a partitioned table.
Edge Case 2: Monitoring and Alerting
How do you know pg_partman is working? Blind faith is not a strategy.
* Partition Count: Monitor the total number of partitions. A flat line indicates run_maintenance_proc() is failing.
-- Prometheus/Datadog Query
SELECT count(*) FROM pg_inherits WHERE inhparent = 'public.iot_events'::regclass;
* Future Partition Check: More importantly, monitor the premake buffer. This query should always return your premake value (e.g., 4). If it drops, you are at risk of insert stalls.
SELECT (SELECT c.relname FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE n.nspname = pa.partition_schemaname AND c.relname = pa.partition_tablename)::text AS partition_name,
partition_start_time,
partition_end_time
FROM partman.show_partitions('public.iot_events', 'DESC') pa
WHERE partition_start_time > now()
ORDER BY partition_start_time ASC;
Your alerting should fire if the COUNT of rows from this query is less than your configured premake value.
Edge Case 3: The `infinite_time_partitions` Gotcha
When pg_partman creates partitions, it defines the range (e.g., FOR VALUES FROM ('2023-10-26 00:00:00+00') TO ('2023-10-27 00:00:00+00')). What happens to data far in the future or far in the past? By default, it will fail to insert. pg_partman has a feature (infinite_time_partitions) to create catch-all partitions for past/future data. In most production time-series systems, you should leave this false (the default). Erroneous future timestamps are almost always bad data. It's better for an INSERT to fail than to silently insert data into a garbage partition you'll never query.
Conclusion: From Liability to Asset
A multi-terabyte time-series table is one of the biggest liabilities in a growing system. Without proper management, its performance degradation is not a matter of if but when.
By implementing pg_partman with these advanced, production-oriented patterns, you transform this liability into a highly performant and scalable asset. The key takeaways are to think beyond the basics:
pg_cron and run_maintenance_proc to make partition lifecycle management a zero-touch operation.template_table to deploy and maintain sophisticated indexing schemes across hundreds or thousands of partitions effortlessly.partition_data_proc for backfills and migrations to avoid catastrophic performance hits.premake), not just on failures.Mastering these techniques elevates you from someone who simply uses a database to an engineer who can architect data systems that scale gracefully with business growth.