Dynamic Partial Indexing for Hot Tenants in Multi-Tenant PostgreSQL
The Inevitable Bottleneck of Naive Multi-Tenancy Indexing
In any mature multi-tenant SaaS application running on PostgreSQL, the (tenant_id, ...) composite index is the workhorse. It's the first line of defense, ensuring that queries scoped to a single tenant are reasonably performant. However, as tables grow to hundreds of millions or billions of rows, and data distribution becomes heavily skewed, this pattern reveals its limitations. The Pareto principle is often in full effect: 80% of your data and activity comes from 20% (or less) of your tenants. We call these 'hot tenants'.
When a users or documents table is dominated by a few hyper-active tenants, the B-tree for your (tenant_id, column_name) index becomes incredibly deep and bloated with data from thousands of 'cold' tenants. While an index scan on (tenant_id, status) is far better than a sequential scan, the database still traverses a large index structure, leading to increased I/O and non-trivial performance degradation for your most important customers.
This article presents a surgical, dynamic strategy to address this specific problem. We will not re-hash the basics of indexing. Instead, we will design and implement an automated system that identifies hot tenants and creates highly-efficient, specialized partial indexes for them, all without application downtime.
Baseline Analysis: The Composite Index Under Load
Let's establish a concrete scenario. Consider a typical analytics_events table in a SaaS product.
CREATE TABLE analytics_events (
id BIGSERIAL PRIMARY KEY,
tenant_id UUID NOT NULL,
event_type TEXT NOT NULL,
payload JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- The standard, go-to index
CREATE INDEX idx_analytics_events_tenant_id_event_type
ON analytics_events (tenant_id, event_type);
Now, let's simulate a heavily skewed data distribution. We'll insert 10 million rows across 5,000 tenants, but we'll make two tenants, tenant_A and tenant_B, responsible for 80% of the data.
-- Fictional UUIDs for our hot tenants
-- In a real scenario, these would be your actual tenant IDs.
-- '00000000-0000-0000-0000-00000000000a'
-- '00000000-0000-0000-0000-00000000000b'
-- Generate skewed data (conceptual script)
INSERT INTO analytics_events (tenant_id, event_type, created_at)
SELECT
-- 80% of data goes to two tenants
CASE
WHEN random() < 0.4 THEN '00000000-0000-0000-0000-00000000000a'::uuid
WHEN random() < 0.8 THEN '00000000-0000-0000-0000-00000000000b'::uuid
ELSE gen_random_uuid()
END,
'page_view',
NOW() - (s * INTERVAL '1 second')
FROM generate_series(1, 10000000) s;
ANALYZE analytics_events;
Let's check the size of our table and index. The exact numbers will vary, but the proportions are what matter.
-- Table size might be ~1.5 GB
SELECT pg_size_pretty(pg_total_relation_size('analytics_events'));
-- Index size might be ~400 MB
SELECT pg_size_pretty(pg_relation_size('idx_analytics_events_tenant_id_event_type'));
The index is substantial because it must map every single row in the table. Now, let's query for one of our hot tenants.
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, created_at
FROM analytics_events
WHERE tenant_id = '00000000-0000-0000-0000-00000000000a'
AND event_type = 'page_view'
ORDER BY created_at DESC
LIMIT 100;
Sample EXPLAIN Output (Composite Index):
Limit (cost=0.56..241.97 rows=100 width=16) (actual time=0.129..3.894 rows=100 loops=1)
Buffers: shared hit=2841
-> Index Scan Backward using idx_analytics_events_tenant_id_event_type on analytics_events ...
Index Cond: (tenant_id = '00000000-0000-0000-0000-00000000000a'::uuid AND event_type = 'page_view'::text)
Buffers: shared hit=2841
Planning Time: 0.158 ms
Execution Time: 3.921 ms
The planner correctly uses the index. The execution time isn't terrible, but look at the Buffers: shared hit=2841. That's nearly 3000 8kB pages that PostgreSQL had to read from memory (or disk) to satisfy this query. This is the cost of traversing a large B-tree that is mostly filled with data we don't care about for this specific query.
The Partial Index Advantage: A Static Demonstration
A partial index is an index built on a subset of a table's rows, defined by a WHERE clause. It's smaller, more compact, and consequently, faster to scan. Let's create one specifically for our hot tenant.
CRITICAL NOTE: In production, always use CREATE INDEX CONCURRENTLY to avoid taking a write lock on your table. For this demonstration, we'll omit it for simplicity, but we will use it in our final automated solution.
CREATE INDEX idx_events_hot_tenant_a_event_type
ON analytics_events (event_type)
WHERE tenant_id = '00000000-0000-0000-0000-00000000000a';
ANALYZE analytics_events;
Let's check the size of this new index. It will be a fraction of the composite index, perhaps around 150MB, as it only contains data for the ~4 million rows of tenant_A.
Now, re-run the exact same query:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, created_at
FROM analytics_events
WHERE tenant_id = '00000000-0000-0000-0000-00000000000a'
AND event_type = 'page_view'
ORDER BY created_at DESC
LIMIT 100;
Sample EXPLAIN Output (Partial Index):
Limit (cost=0.43..93.21 rows=100 width=16) (actual time=0.035..0.451 rows=100 loops=1)
Buffers: shared hit=105
-> Index Scan Backward using idx_events_hot_tenant_a_event_type on analytics_events ...
Index Cond: (event_type = 'page_view'::text)
Buffers: shared hit=105
Planning Time: 0.213 ms
Execution Time: 0.478 ms
The difference is stark:
| Metric | Composite Index | Partial Index | Improvement |
|---|---|---|---|
| Execution Time | 3.921 ms | 0.478 ms | ~8.2x |
| Buffer Hits | 2841 | 105 | ~27x |
We achieved an ~8x speedup in execution time and a ~27x reduction in buffer reads. The query planner is smart enough to see that a smaller, more specific index exists that satisfies the WHERE clause and uses it. This is the power we want to harness, but manually creating these indexes is not a scalable solution.
The Core Strategy: A Dynamic Management System
Our goal is to automate the creation and eventual cleanup of these partial indexes for tenants that cross a certain activity or size threshold. This requires three components: Identification, Management, and Automation.
1. Identification: Finding Your Hot Tenants
First, we need a reliable way to define and find 'hot' tenants. The definition will vary by application—it could be based on row count, query frequency, or revenue. A simple and effective starting point is row count.
We can run a periodic query to find the top N tenants by data volume.
-- Find the top 5 tenants by row count in the analytics_events table
SELECT tenant_id, COUNT(*)
FROM analytics_events
GROUP BY tenant_id
ORDER BY COUNT(*) DESC
LIMIT 5;
This is a heavy query to run frequently on a massive table. For a more sophisticated, lower-overhead approach, you can leverage PostgreSQL's statistics extensions:
* pg_stat_statements: If enabled, you can parse query strings to find which tenant_id values appear most frequently in WHERE clauses. This identifies tenants with high query traffic, not just large data volume.
* Application-Level Metrics: Your application itself is often the best source. You can maintain counters in a system like Redis for tenant activity and periodically flush the leaders to a database table for the index manager to consume.
For our implementation, we will stick with the row count approach as it is self-contained within the database.
2. Management: A Schema for Tracking Dynamic Indexes
We cannot blindly create indexes. We need a manifest to track which indexes we've created, for which tenants, and on which tables. This prevents duplicate creation and provides a mechanism for cleanup.
CREATE TABLE system_managed_partial_indexes (
id SERIAL PRIMARY KEY,
schema_name TEXT NOT NULL,
table_name TEXT NOT NULL,
index_name TEXT NOT NULL UNIQUE,
index_definition TEXT NOT NULL,
tenant_id UUID NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_verified_at TIMESTAMPTZ,
decommissioned_at TIMESTAMPTZ,
-- A unique constraint to prevent creating the same logical index twice
UNIQUE(schema_name, table_name, tenant_id, index_definition)
);
* index_name: We will generate a unique, predictable name for each index.
* index_definition: Stores the core columns of the index (e.g., (event_type, created_at)).
* tenant_id: The tenant this index is for.
* decommissioned_at: A soft-delete flag for our cleanup process.
3. Automation: The PL/pgSQL Management Function
This is where we tie everything together. We'll create a stored procedure that performs the logic of identifying, creating, and logging the indexes. This function can be run periodically by a cron job or a database scheduler like pg_cron.
Here is a production-grade PL/pgSQL function to manage this process. It's designed to be idempotent and safe.
CREATE OR REPLACE FUNCTION manage_hot_tenant_partial_indexes(
p_schema_name TEXT,
p_table_name TEXT,
p_index_definition TEXT, -- e.g., '(event_type, created_at DESC)'
p_tenant_threshold BIGINT, -- Minimum row count to be considered 'hot'
p_max_hot_tenants INT -- Max number of tenants to manage
) RETURNS VOID AS $$
DECLARE
hot_tenant RECORD;
v_index_name TEXT;
v_sql TEXT;
BEGIN
-- Find tenants exceeding the threshold that don't already have a managed index
FOR hot_tenant IN
SELECT tenant_id FROM (
SELECT tenant_id, COUNT(*) as row_ct
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace,
LATERAL (
SELECT tenant_id FROM public.analytics_events -- This should be dynamic
) AS t
WHERE n.nspname = p_schema_name AND c.relname = p_table_name
GROUP BY tenant_id
HAVING COUNT(*) >= p_tenant_threshold
ORDER BY row_ct DESC
LIMIT p_max_hot_tenants
) as top_tenants
WHERE NOT EXISTS (
SELECT 1
FROM system_managed_partial_indexes smpi
WHERE smpi.schema_name = p_schema_name
AND smpi.table_name = p_table_name
AND smpi.tenant_id = top_tenants.tenant_id
AND smpi.index_definition = p_index_definition
AND smpi.decommissioned_at IS NULL
)
LOOP
RAISE NOTICE 'Found hot tenant % for table %.% requiring an index.', hot_tenant.tenant_id, p_schema_name, p_table_name;
-- Generate a unique and descriptive index name
v_index_name := format('dyn_partial_idx_%s_%s_%s',
p_table_name,
-- Sanitize index definition for the name
translate(p_index_definition, '() ,', '___'),
-- Use first 8 chars of tenant UUID for brevity
left(hot_tenant.tenant_id::text, 8));
-- Construct the CREATE INDEX CONCURRENTLY statement
-- Using format() with %I and %L for safe identifier and literal quoting
v_sql := format(
'CREATE INDEX CONCURRENTLY %I ON %I.%I USING btree %s WHERE tenant_id = %L;',
v_index_name,
p_schema_name,
p_table_name,
p_index_definition,
hot_tenant.tenant_id
);
RAISE NOTICE 'Executing: %', v_sql;
-- Execute the DDL
BEGIN
EXECUTE v_sql;
-- If successful, log it to our management table
INSERT INTO system_managed_partial_indexes
(schema_name, table_name, index_name, index_definition, tenant_id)
VALUES
(p_schema_name, p_table_name, v_index_name, p_index_definition, hot_tenant.tenant_id);
RAISE NOTICE 'Successfully created and logged index %', v_index_name;
EXCEPTION
WHEN duplicate_object THEN
RAISE WARNING 'Index % already exists. It may have been created outside this management script. Consider importing it.', v_index_name;
WHEN others THEN
RAISE WARNING 'Failed to create index for tenant %: %', hot_tenant.tenant_id, SQLERRM;
END;
END LOOP;
END;
$$ LANGUAGE plpgsql;
To run this:
-- Create a partial index on (event_type) for any tenant with over 2 million rows
-- but only manage up to 10 such tenants at a time.
SELECT manage_hot_tenant_partial_indexes(
p_schema_name := 'public',
p_table_name := 'analytics_events',
p_index_definition := '(event_type)',
p_tenant_threshold := 2000000,
p_max_hot_tenants := 10
);
This function is the heart of our system. It's safe to re-run, will only act on new hot tenants, and uses CONCURRENTLY to ensure the application remains online during index creation.
Decommissioning: Handling Tenants That Cool Down
What happens when a hot tenant's activity drops? We need a process to clean up unused indexes to reclaim space and reduce write overhead.
An aggressive approach would be to drop indexes for any tenant that falls below the threshold. A more conservative, safer strategy is to only drop indexes that have been unused for a significant period. This requires more sophisticated usage tracking (e.g., from pg_stat_user_indexes).
A balanced approach is to soft-decommission first, then hard-delete later.
CREATE OR REPLACE FUNCTION decommission_cold_tenant_indexes(
p_schema_name TEXT,
p_table_name TEXT,
p_tenant_threshold BIGINT
) RETURNS VOID AS $$
DECLARE
cold_tenant_index RECORD;
v_sql TEXT;
BEGIN
-- Find managed indexes for tenants that are now BELOW the threshold
FOR cold_tenant_index IN
SELECT smpi.id, smpi.index_name, smpi.tenant_id
FROM system_managed_partial_indexes smpi
WHERE smpi.schema_name = p_schema_name
AND smpi.table_name = p_table_name
AND smpi.decommissioned_at IS NULL
AND (
SELECT COUNT(*) FROM public.analytics_events -- Should be dynamic
WHERE tenant_id = smpi.tenant_id
) < p_tenant_threshold
LOOP
RAISE NOTICE 'Tenant % has cooled down. Decommissioning index %.', cold_tenant_index.tenant_id, cold_tenant_index.index_name;
-- Soft delete: mark as decommissioned
UPDATE system_managed_partial_indexes
SET decommissioned_at = NOW()
WHERE id = cold_tenant_index.id;
-- Actually drop the index
v_sql := format('DROP INDEX CONCURRENTLY IF EXISTS %I.%I;', p_schema_name, cold_tenant_index.index_name);
RAISE NOTICE 'Executing: %', v_sql;
EXECUTE v_sql;
END LOOP;
END;
$$ LANGUAGE plpgsql;
This function can be run less frequently than the creation function (e.g., weekly vs. daily) to avoid thrashing—creating and dropping indexes for a tenant whose activity fluctuates around the threshold.
Advanced Considerations and Edge Cases
This strategy is powerful, but it's not a magic bullet. Senior engineers must consider the trade-offs.
* Write Performance Overhead: Every additional index incurs a penalty on INSERT, UPDATE, and DELETE operations for the affected rows. For a hot tenant, this means writes become slightly slower. You must benchmark this impact. If the table is extremely write-heavy (e.g., logging), the benefit to read queries may not justify the write penalty.
* Query Planner Complexity: With more available indexes, the query planner has more choices. In rare cases, it might make a suboptimal choice. Regularly monitor query performance for your hot tenants using pg_stat_statements to ensure the correct partial indexes are being used. Tools like pg_hint_plan are a last resort for forcing a plan in pathological cases.
The CONCURRENTLY Caveat: CREATE INDEX CONCURRENTLY is not atomic. It can fail midway, leaving behind an invalid index. Your management script must be able to handle this, and you should have monitoring in place for invalid indexes (SELECT FROM pg_index WHERE NOT indisvalid;). Similarly, DROP INDEX CONCURRENTLY can also fail, requiring manual intervention.
* Autovacuum and Statistics: After creating a new index, ANALYZE must be run on the table for the planner to generate accurate statistics and decide to use it. A well-tuned autovacuum daemon is essential for this strategy to work effectively.
Alternative Strategy: When to Use Table Partitioning
Dynamic partial indexing is a surgical optimization. The heavyweight alternative is Table Partitioning. Using PARTITION BY LIST (tenant_id), you can create a separate physical table for each tenant (or groups of tenants).
| Feature | Dynamic Partial Indexing | Table Partitioning |
|---|---|---|
| Granularity | Optimizes queries for specific tenants within one large table. | Physically isolates data for each tenant. |
| Implementation | Can be added to an existing system with minimal disruption. | Major architectural change; best done from the start. |
| Primary Use Case | Read-heavy workloads with skewed data distribution. | Tenant data isolation, bulk deletion (DROP PARTITION). |
| Overhead | Moderate write overhead, small planner overhead. | High planner overhead, tricky foreign keys, DDL complexity. |
| Flexibility | Easy to add/remove optimizations for individual tenants. | Rigid structure. Adding a new tenant requires adding a partition. |
Heuristic: Use dynamic partial indexing when you need to surgically improve query performance for a subset of tenants on an existing, monolithic table. Use table partitioning when tenant data isolation is a core architectural requirement from day one, and you anticipate needing to manage data at the tenant level (e.g., tenant-specific backup/restore, easy data deletion for GDPR, etc.).
Conclusion
The standard (tenant_id, ...) composite index is the foundation of multi-tenant application performance, but it is a blunt instrument. For systems operating at scale, data skew is not an edge case; it is the norm. By implementing a robust, automated system for managing partial indexes, you can create a two-tiered performance model: a solid baseline for all customers provided by the composite index, and a supercharged, optimized experience for your largest and most active customers.
This dynamic partial indexing strategy is an advanced technique that requires careful monitoring and an understanding of its trade-offs. However, when applied correctly, it provides a powerful, targeted solution to one of the most common and challenging performance problems in multi-tenant architecture, allowing your database to scale gracefully with the success of your customers.