Scaling Multi-Tenant RLS in PostgreSQL with Partial Indexes
The Performance Cliff of Multi-Tenant RLS at Scale
In a mature multi-tenant SaaS application running on PostgreSQL, Row-Level Security (RLS) is the canonical solution for enforcing data isolation. It's declarative, robust, and lives within the database, ensuring that no matter how a user connects, they can only see their own tenant's data. The typical implementation involves a tenant_id column on every relevant table and a policy that filters rows based on a session variable, often set using SET a.tenant_id = '...'.
The problem arises not from the security model itself, but from the query performance under load. A naive but common approach is to place a standard B-tree index on the tenant_id column:
CREATE INDEX idx_orders_tenant_id ON orders (tenant_id);
On a table with billions of rows spread across tens of thousands of tenants, this index becomes a performance liability. For any single tenant's query, the selectivity is extremely low. If a tenant has 100,000 orders in a 2-billion-row table, the planner sees that the tenant_id index points to a significant fraction of the table's total pages. This often leads PostgreSQL to favor a Bitmap Heap Scan over a more direct Index Scan. While better than a full sequential scan, it's far from optimal and scales poorly as the table grows.
The RLS policy application adds another layer of complexity. The planner must incorporate the policy's WHERE clause into its calculations, and its ability to efficiently use a large, low-selectivity global index is limited. The result is predictable query latency that creeps up until it becomes a critical production issue.
This article presents a solution: abandoning the single global tenant_id index in favor of a highly-targeted, per-tenant partial index strategy. This pattern dramatically improves query performance for individual tenants by creating small, dense, and highly-selective indexes that the query planner can use with extreme efficiency.
Baseline Scenario: Demonstrating the Problem
Let's establish a concrete schema and data model to analyze. Assume we have a system managing orders for multiple tenants.
Schema:
-- Tenant management table
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Orders table with a foreign key to tenants
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
tenant_id UUID NOT NULL REFERENCES tenants(id),
customer_id BIGINT NOT NULL,
order_date TIMESTAMPTZ NOT NULL DEFAULT NOW(),
amount NUMERIC(10, 2) NOT NULL,
status TEXT NOT NULL DEFAULT 'pending'
);
-- A standard, global index on tenant_id
CREATE INDEX idx_orders_tenant_id ON orders (tenant_id);
-- An index for a common query pattern
CREATE INDEX idx_orders_tenant_date ON orders (tenant_id, order_date DESC);
Data Population (Simulated):
Let's populate this with 1,000 tenants and 100 million orders, giving each tenant roughly 100,000 orders.
-- This is a conceptual script; running it would take time.
-- INSERT INTO tenants (id, name) SELECT gen_random_uuid(), 'Tenant ' || i FROM generate_series(1, 1000) i;
-- INSERT INTO orders (tenant_id, customer_id, order_date, amount, status)
-- SELECT
-- t.id,
-- (random() * 10000)::int,
-- NOW() - (random() * 365)::int * '1 day'::interval,
-- (random() * 500)::numeric(10,2),
-- 'completed'
-- FROM tenants t, generate_series(1, 100000);
RLS Implementation:
-- Enable RLS on the orders table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
-- Create a policy that checks a session variable
CREATE POLICY tenant_isolation_policy ON orders
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID)
WITH CHECK (tenant_id = current_setting('app.tenant_id')::UUID);
Now, let's analyze a typical query to fetch recent orders for a specific tenant.
-- Set the session context for our target tenant
-- Let's assume this tenant's UUID is 'a1b2c3d4-...'
SET app.tenant_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d';
-- Analyze the query plan
EXPLAIN ANALYZE
SELECT id, order_date, amount, status
FROM orders
WHERE order_date > NOW() - '30 days'::interval
ORDER BY order_date DESC
LIMIT 100;
Expected EXPLAIN ANALYZE Output (with global index):
Limit (cost=12345.67..12346.92 rows=100 width=34) (actual time=150.123..150.456 rows=100 loops=1)
-> Sort (cost=12345.67..12389.12 rows=17380 width=34) (actual time=150.120..150.300 rows=100 loops=1)
Sort Key: order_date DESC
Sort Method: top-N heapsort Memory: 35kB
-> Bitmap Heap Scan on orders (cost=456.78..12100.23 rows=17380 width=34) (actual time=10.123..140.987 rows=8345 loops=1)
Recheck Cond: (tenant_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d'::uuid AND order_date > (now() - '30 days'::interval))
Heap Blocks: exact=7980
-> Bitmap Index Scan on idx_orders_tenant_date (cost=0.00..452.43 rows=17380 width=0) (actual time=8.123..8.123 rows=8345 loops=1)
Index Cond: (tenant_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d'::uuid AND order_date > (now() - '30 days'::interval))
Planning Time: 0.543 ms
Execution Time: 151.012 ms
The key takeaway here is the Bitmap Heap Scan. The planner uses the idx_orders_tenant_date index to create a bitmap of pages to visit, then visits each of those pages (Heap Blocks: exact=7980) to fetch the rows and re-check the condition. While the index scan itself is fast (8ms), the subsequent heap scan is slow (130ms) because the tenant's data is scattered across thousands of pages. The total execution time of ~150ms is unacceptable for a primary application query.
The Partial Index Solution
A partial index is an index built over a subset of a table's rows, defined by a WHERE clause. The core idea of our pattern is to create a dedicated index for each tenant.
Let's create a partial index for our specific tenant and the same query pattern.
-- Create a specific index just for our target tenant
CREATE INDEX idx_orders_tenant_a1b2c3d4_date
ON orders (order_date DESC)
WHERE tenant_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d';
Notice two crucial differences from the composite index:
tenant_id is now in the WHERE clause, not the index key itself.(order_date DESC), which perfectly matches our ORDER BY clause.This index is tiny compared to the global one. It contains entries only for this single tenant. Now, let's re-run the EXPLAIN ANALYZE.
-- Ensure we are using the new index (Postgres planner is smart, but let's be sure)
SET enable_bitmapscan = off; -- For demonstration purposes
EXPLAIN ANALYZE
SELECT id, order_date, amount, status
FROM orders
WHERE order_date > NOW() - '30 days'::interval
ORDER BY order_date DESC
LIMIT 100;
SET enable_bitmapscan = on; -- Reset
Expected EXPLAIN ANALYZE Output (with partial index):
Limit (cost=0.43..12.34 rows=100 width=34) (actual time=0.045..0.567 rows=100 loops=1)
-> Index Scan using idx_orders_tenant_a1b2c3d4_date on orders (cost=0.43..1012.45 rows=8345 width=34) (actual time=0.043..0.432 rows=100 loops=1)
Index Cond: (order_date > (now() - '30 days'::interval))
Planning Time: 0.210 ms
Execution Time: 0.615 ms
The difference is staggering. The execution time has dropped from ~150ms to ~0.6ms—a >200x improvement.
Why is it so much faster?
WHERE tenant_id = 'a1b2c3d4-...', it perfectly matches the index's WHERE clause.Index Scan. Because the index key is order_date DESC and the query has ORDER BY order_date DESC, the planner can simply walk the index in order and stop after finding 100 rows. It doesn't need to fetch all 8,345 matching rows and sort them in memory.Productionizing the Pattern: Automated Index Management
Manually creating an index for every tenant is not a viable strategy. We need an automated, robust system for managing the lifecycle of these partial indexes.
A naive approach might use a trigger on the tenants table. However, CREATE INDEX can take an exclusive lock on the table, blocking writes. Running this inside the tenant creation transaction is a recipe for deadlocks and timeouts. The correct production pattern uses an asynchronous worker.
The Asynchronous Worker Pattern
NOTIFY).new_tenant message, the service connects to the database and executes CREATE INDEX CONCURRENTLY .... The CONCURRENTLY keyword is critical—it builds the index without taking locks that would block writes to the table.Step 1: Notifying the Worker
We can use a simple trigger with pg_notify for a purely PostgreSQL-based solution. This avoids external message queue dependencies for simpler setups.
-- A function to send a notification on new tenant creation
CREATE OR REPLACE FUNCTION notify_new_tenant()
RETURNS TRIGGER AS $$
BEGIN
PERFORM pg_notify('new_tenants', NEW.id::text);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- A trigger on the tenants table
CREATE TRIGGER new_tenant_trigger
AFTER INSERT ON tenants
FOR EACH ROW
EXECUTE FUNCTION notify_new_tenant();
Step 2: The Indexing Worker (Python Example)
This Python service would run continuously, listening for notifications.
import psycopg2
import psycopg2.extensions
import select
import time
import logging
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
DB_CONN_STRING = "dbname='yourdb' user='youruser' password='yourpass' host='localhost'"
# Define the indexes you want to create for each tenant.
# Using a template makes it easy to manage multiple partial indexes.
INDEX_DEFINITIONS = [
{
"name_template": "idx_orders_tenant_{tenant_id_safe}_date",
"sql_template": """
CREATE INDEX CONCURRENTLY IF NOT EXISTS {index_name}
ON orders (order_date DESC)
WHERE tenant_id = '{tenant_id}';
"""
},
{
"name_template": "idx_orders_tenant_{tenant_id_safe}_status_date",
"sql_template": """
CREATE INDEX CONCURRENTLY IF NOT EXISTS {index_name}
ON orders (status, order_date DESC)
WHERE tenant_id = '{tenant_id}';
"""
}
]
def create_partial_indexes_for_tenant(tenant_id):
"""Creates all defined partial indexes for a given tenant_id."""
logging.info(f"Processing request to create indexes for tenant: {tenant_id}")
# Sanitize tenant_id for use in index names (replace hyphens)
tenant_id_safe = tenant_id.replace('-', '')
try:
with psycopg2.connect(DB_CONN_STRING) as conn:
conn.autocommit = True # Required for CREATE INDEX CONCURRENTLY outside a transaction block
with conn.cursor() as cur:
for index_def in INDEX_DEFINITIONS:
index_name = index_def["name_template"].format(tenant_id_safe=tenant_id_safe)
sql = index_def["sql_template"].format(index_name=index_name, tenant_id=tenant_id)
logging.info(f"Executing: {sql}")
start_time = time.time()
cur.execute(sql)
end_time = time.time()
logging.info(f"Index '{index_name}' created or already exists. Took {end_time - start_time:.2f}s.")
except psycopg2.Error as e:
logging.error(f"Database error while creating indexes for tenant {tenant_id}: {e}")
def listen_for_tenants():
"""Main listener loop."""
conn = psycopg2.connect(DB_CONN_STRING)
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
curs = conn.cursor()
curs.execute("LISTEN new_tenants;")
logging.info("Worker started. Listening for new_tenants notifications...")
while True:
if select.select([conn], [], [], 60) == ([], [], []):
logging.info("Timeout, no new notifications. Pinging connection.")
curs.execute('SELECT 1;') # Keep a heartbeat
else:
conn.poll()
while conn.notifies:
notify = conn.notifies.pop(0)
logging.info(f"Received notification: {notify.channel}, payload: {notify.payload}")
create_partial_indexes_for_tenant(notify.payload)
if __name__ == "__main__":
listen_for_tenants()
This worker is robust: it uses CREATE INDEX CONCURRENTLY IF NOT EXISTS to be idempotent and non-blocking. It's template-driven, allowing you to easily add new partial index definitions as your application's query patterns evolve.
Advanced Considerations and Edge Cases
Implementing this pattern in a large-scale production environment requires considering several edge cases.
1. Index Proliferation and Planner Overhead
With 50,000 tenants and 2 partial indexes per tenant, you now have 100,000 indexes on the orders table. How does this affect PostgreSQL?
* pg_class and System Catalogs: Each index is a row in system catalogs like pg_class and pg_index. This increases the size of these catalogs, which can slightly slow down planning time for all queries against the table, as the planner has more potential indexes to evaluate.
* Planner Cost: For a given query, the planner must evaluate each potentially applicable index. With thousands of partial indexes, its constraint exclusion logic is highly effective. It will quickly discard all indexes whose WHERE clause does not match the query's WHERE clause (e.g., tenant_id = '...'). In practice, the overhead is minimal for up to tens of thousands of indexes. Performance degradation typically becomes a concern only when approaching hundreds of thousands of indexes on a single table.
* Monitoring: You need a way to track these indexes. A simple query can help:
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders' AND indexdef LIKE '%WHERE tenant_id =%';
2. Tenant Offboarding: Index Cleanup
When a tenant is deleted or deactivated, their partial indexes become dead weight, consuming disk space and adding to planner overhead. The offboarding process must include a step to drop them.
This should also be handled asynchronously. The offboarding request can trigger an event for a cleanup worker.
# In your cleanup worker
def drop_partial_indexes_for_tenant(tenant_id):
logging.info(f"Processing request to drop indexes for tenant: {tenant_id}")
tenant_id_safe = tenant_id.replace('-', '')
try:
with psycopg2.connect(DB_CONN_STRING) as conn:
conn.autocommit = True
with conn.cursor() as cur:
for index_def in INDEX_DEFINITIONS:
index_name = index_def["name_template"].format(tenant_id_safe=tenant_id_safe)
sql = f"DROP INDEX CONCURRENTLY IF EXISTS {index_name};"
logging.info(f"Executing: {sql}")
cur.execute(sql)
except psycopg2.Error as e:
logging.error(f"Database error while dropping indexes for tenant {tenant_id}: {e}")
DROP INDEX CONCURRENTLY is again crucial to avoid locking the table during cleanup.
3. The Alternative: Declarative Partitioning
For systems with extremely high tenant counts (hundreds of thousands to millions) or very high write throughput per tenant, the partial index pattern can reach its limits. The ultimate solution for this scale is declarative partitioning, typically using LIST partitioning on tenant_id.
-- Conceptual example
CREATE TABLE orders_partitioned (
-- columns...
tenant_id UUID NOT NULL
) PARTITION BY LIST (tenant_id);
CREATE TABLE orders_tenant_a1b2c3d4 PARTITION OF orders_partitioned
FOR VALUES IN ('a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d');
-- Each partition gets its own local indexes
CREATE INDEX idx_orders_tenant_a1b2c3d4_date ON orders_tenant_a1b2c3d4 (order_date DESC);
* Pros: This is the most scalable solution. Each partition is effectively a separate table with its own local indexes. The planner can completely ignore all other partitions (partition pruning), which is even more efficient than ignoring partial indexes. It also physically isolates tenant data, which can simplify vacuuming and maintenance.
* Cons: It introduces significant operational complexity. You need a robust process for creating new partitions when tenants are onboarded. Foreign keys become more complex to manage. Queries that need to span across all tenants (e.g., for administrative analytics) become more difficult.
Conclusion on Partitioning: Partial indexes are a powerful, less complex stepping stone. They provide most of the performance benefits of partitioning for read-heavy workloads without the full operational overhead. For many SaaS applications, the partial index pattern is the optimal balance of performance and complexity.
4. Backfilling Indexes for Existing Tenants
If you're implementing this pattern on an existing system, you'll need a one-time script to create indexes for all existing tenants.
# A one-off backfill script
def backfill_all_tenants():
with psycopg2.connect(DB_CONN_STRING) as conn:
with conn.cursor() as cur:
cur.execute("SELECT id FROM tenants;")
tenant_ids = [row[0] for row in cur.fetchall()]
# Use a thread pool to parallelize the work
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=8) as executor:
executor.map(create_partial_indexes_for_tenant, tenant_ids)
Running this with a thread pool can significantly speed up the backfill process, but be mindful of the load it places on your database server.
Final Thoughts: A Strategic Trade-off
Adopting a per-tenant partial index strategy is a conscious architectural decision. It trades increased management complexity (automating index creation/deletion) for a massive, scalable improvement in query performance for RLS-protected, multi-tenant systems.
This is not a solution for every application. If your tenant count is low, or your tables are small, a standard composite index on (tenant_id, ...) is perfectly adequate. But if you are on a high-growth trajectory and your orders table is heading towards billions of rows, this pattern can be the difference between a responsive application and one that grinds to a halt under its own weight.
By leveraging asynchronous workers and CONCURRENTLY operations, you can implement this advanced strategy safely in a production environment, ensuring your application's data layer remains fast and efficient as you scale.