PostgreSQL Partial Indexes for Multi-Tenant SaaS RLS Optimization
The Performance Cliff of Row-Level Security in Multi-Tenant Architectures
In a multi-tenant SaaS environment, Row-Level Security (RLS) is the canonical solution for ensuring strict data isolation within a shared PostgreSQL database. It's elegant, robust, and centralizes security logic at the database layer. However, as data volume scales into the hundreds of millions or billions of rows, a common and insidious performance problem emerges: RLS policies can effectively poison the query planner's ability to use standard indexes, leading to catastrophic performance degradation.
Consider a typical SaaS schema for a project management tool:
-- A simplified schema for our multi-tenant application
CREATE TABLE organizations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
plan_type TEXT NOT NULL DEFAULT 'standard',
is_active BOOLEAN NOT NULL DEFAULT TRUE
);
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL REFERENCES organizations(id),
email TEXT NOT NULL UNIQUE
);
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL REFERENCES organizations(id),
name TEXT NOT NULL
);
CREATE TABLE tasks (
id BIGSERIAL PRIMARY KEY,
project_id UUID NOT NULL REFERENCES projects(id),
organization_id UUID NOT NULL REFERENCES organizations(id), -- Denormalized for simpler RLS
title TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'todo', -- todo, in_progress, done
archived BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Standard B-Tree index on the foreign key
CREATE INDEX idx_tasks_organization_id ON tasks(organization_id);
To enforce tenancy, we enable RLS and apply a policy. The tenant ID is typically set as a runtime parameter within the transaction scope.
-- Enable RLS on the tasks table
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
-- Policy to ensure users only see tasks from their organization
CREATE POLICY tasks_isolation_policy ON tasks
FOR ALL
USING (organization_id = current_setting('app.current_organization_id')::UUID);
Now, let's simulate a production environment. We'll populate the tasks table with 20 million rows across 10,000 organizations. The data distribution is highly skewed, mimicking a real-world scenario: a few "whale" tenants have millions of tasks, while thousands of smaller tenants have only a few hundred.
When a user from a small organization (e.g., organization_id = '...' with only 500 tasks) queries their data, the expectation is a near-instant response, thanks to idx_tasks_organization_id. However, the reality is often grim.
-- Set the current tenant for the session
SET app.current_organization_id = '...'; -- UUID of a small tenant
-- Run a simple query to fetch all tasks for this tenant
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tasks;
The Disappointing Query Plan
Instead of a fast Index Scan, you might see something like this:
Gather (cost=1000.00..293810.54 rows=100 width=85) (actual time=15.333..1850.432 rows=500 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=124018
-> Parallel Seq Scan on tasks (cost=0.00..292800.54 rows=42 width=85) (actual time=1835.120..1835.122 rows=167 loops=3)
Filter: (organization_id = current_setting('app.current_organization_id')::uuid)
Rows Removed by Filter: 6666500
Buffers: shared hit=124018
Planning Time: 0.152 ms
Execution Time: 1850.598 ms
Analysis: A 1.8-second query to retrieve 500 rows is unacceptable. The planner has chosen a Parallel Sequential Scan, reading the entire 20-million-row table from disk (or buffer cache) and filtering out rows that don't match the RLS policy. It completely ignored our idx_tasks_organization_id.
Why does this happen? The query planner treats current_setting(...) as a volatile function whose value it cannot know at planning time. It cannot assume the value will be constant across the query's execution. Without a concrete organization_id value to work with during planning, it cannot accurately estimate the selectivity of the filter. Faced with this uncertainty, it often defaults to the most conservative plan: a full table scan, assuming it might have to retrieve a large portion of the table.
This is the RLS performance cliff. As your tasks table grows, every single query, even for your smallest tenants, gets progressively slower.
Strategic Rescue with Partial Indexes
Partial indexes provide a sophisticated solution by allowing us to create highly specialized indexes on a subset of a table's data. The WHERE clause in a CREATE INDEX statement is the key. It allows us to build an index that is smaller, more efficient, and tailored to specific query patterns that the planner can reliably identify.
Let's move beyond generic advice and implement concrete, production-ready patterns.
Pattern 1: The "Hot/Cold" Data Subset Index
In many applications, a vast majority of queries target "hot" or active data. For our tasks table, this is typically non-archived data. Archived tasks might represent 95% of the table volume but are accessed less than 1% of the time.
We can create a partial index that includes only the active, unarchived tasks.
-- Drop the generic index, it's not helping us here.
DROP INDEX idx_tasks_organization_id;
-- Create a partial index for active tasks
CREATE INDEX idx_tasks_unarchived_organization_id
ON tasks (organization_id, status)
WHERE archived = FALSE;
This index is significantly smaller than a full index on the table, containing entries only for the 5% of rows where archived is FALSE. Now, we must ensure our queries can leverage it. This requires a symbiotic relationship between the RLS policy and the application query.
Modified RLS Policy:
We can create different policies for different user roles. For a standard user, we can enforce that they can only see unarchived tasks by default.
-- Drop the old policy
DROP POLICY tasks_isolation_policy ON tasks;
-- Policy for standard users, restricting to their org AND unarchived tasks
CREATE POLICY tasks_user_access ON tasks
FOR SELECT
USING (organization_id = current_setting('app.current_organization_id')::UUID AND archived = FALSE);
-- A more permissive policy for admins who might need to see archived data
CREATE POLICY tasks_admin_access ON tasks
FOR SELECT
USING (organization_id = current_setting('app.current_organization_id')::UUID);
The Query and its Optimized Plan:
When a standard user (whose role activates the tasks_user_access policy) runs their query, the RLS system implicitly adds AND archived = FALSE to the WHERE clause.
-- As a standard user for the same small tenant
SET app.current_organization_id = '...';
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tasks WHERE status = 'in_progress';
The effective query becomes SELECT * FROM tasks WHERE status = 'in_progress' AND (organization_id = '...' AND archived = FALSE). The planner now sees the explicit archived = FALSE predicate, which perfectly matches our partial index's WHERE clause.
Bitmap Heap Scan on tasks (cost=12.29..356.12 rows=50 width=85) (actual time=0.150..0.245 rows=45 loops=1)
Recheck Cond: (organization_id = '...'::uuid)
Filter: (status = 'in_progress'::text)
Rows Removed by Filter: 15
Heap Blocks: exact=55
Buffers: shared hit=60
-> Bitmap Index Scan on idx_tasks_unarchived_organization_id (cost=0.00..12.28 rows=60 width=0) (actual time=0.121..0.122 rows=60 loops=1)
Index Cond: (organization_id = '...'::uuid)
Buffers: shared hit=5
Planning Time: 0.280 ms
Execution Time: 0.315 ms
Analysis: The execution time has plummeted from 1850ms to 0.3ms. This is a >5000x improvement.
* Bitmap Index Scan: The planner uses our partial index (idx_tasks_unarchived_organization_id) to quickly find all unarchived tasks for the given organization_id.
* Buffers: The number of shared buffers hit dropped from 124,018 to just 60. We are reading vastly less data.
Edge Case Handling: What if an admin needs to query archived data? Their query, governed by the tasks_admin_access policy, will not include the archived = FALSE predicate. The planner will correctly ignore our partial index. For this rare use case, we might create a separate, full index or accept that the query will be slower.
-- An index for the less frequent admin queries
CREATE INDEX idx_tasks_archived_org_id ON tasks (organization_id) WHERE archived = TRUE;
This approach segments our indexing strategy to match our data access patterns, keeping the most critical path (hot data access) hyper-optimized.
Pattern 2: The "Active Tenant" Partial Index
Another common multi-tenant scenario is a power-law distribution of tenant activity. A handful of enterprise customers might generate 80% of the daily traffic, while thousands of free-tier or dormant accounts generate very little.
We can exploit this by creating a partial index that only covers these highly active, high-value tenants. This requires a mechanism to flag these tenants, for example, a boolean is_high_traffic on the organizations table.
-- We need to denormalize this flag onto the tasks table for the index
-- This can be maintained via triggers or application logic.
ALTER TABLE tasks ADD COLUMN organization_is_active BOOLEAN;
-- Create the partial index for active organizations
CREATE INDEX idx_tasks_active_orgs
ON tasks (organization_id, created_at DESC)
WHERE organization_is_active = TRUE;
This index is dramatically smaller than a full index, as it only includes data for the subset of tenants we've flagged as active.
The Challenge: How do we make the query planner use this? The RLS policy itself doesn't know if a tenant is active. This pattern requires a subtle but powerful change in our RLS policy logic.
-- A more advanced RLS policy
CREATE OR REPLACE POLICY tasks_isolation_policy ON tasks
FOR ALL
USING (
organization_id = current_setting('app.current_organization_id')::UUID
-- This is the key: we add a predicate that the planner can match to our index
AND (
-- For active orgs, this clause is TRUE and matches the index
(organization_is_active = TRUE)
OR
-- For inactive orgs, this clause is TRUE and they fall back to another plan
(organization_is_active = FALSE)
)
);
This looks redundant. Why add (flag = true OR flag = false) which is always true? Because it provides a crucial hint to the query planner. When PostgreSQL builds the plan, it will generate two potential plans:
organization_is_active = TRUE, which will discover and use idx_tasks_active_orgs.organization_is_active = FALSE, which will ignore that index and likely use a different one (or a sequential scan if no other is available).When a query is executed for a high-traffic tenant (where organization_is_active is true for all their rows), the planner can choose the highly efficient path through the partial index.
Query Plan for an Active Tenant:
Index Scan using idx_tasks_active_orgs on tasks ... (actual time=0.05..1.5ms ...)
Index Cond: (organization_id = '...'::uuid)
Query Plan for an Inactive Tenant:
Seq Scan on tasks ... (actual time=5.0..10.0ms ...)
Filter: (organization_id = '...'::uuid AND organization_is_active = FALSE)
For the inactive tenant, a sequential scan might even be faster if they have very few rows, as it avoids the overhead of index lookups. We have successfully created separate optimal plans for different data subsets, controlled by the data itself.
Production Gotcha: Maintaining the organization_is_active flag is critical. This denormalized column must be kept in sync. A trigger on the organizations table is a robust way to handle this:
CREATE OR REPLACE FUNCTION sync_organization_active_flag()
RETURNS TRIGGER AS $$
BEGIN
UPDATE tasks
SET organization_is_active = NEW.is_active
WHERE organization_id = NEW.id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_sync_org_active
AFTER UPDATE OF is_active ON organizations
FOR EACH ROW
EXECUTE FUNCTION sync_organization_active_flag();
This trigger introduces write overhead, a classic performance trade-off that must be benchmarked and deemed acceptable for the massive read performance gains.
Pattern 3: Partial BRIN Indexes for Massive Time-Series Data
Imagine a table storing IoT events, logging, or analytics, easily reaching terabytes in size. B-Tree indexes, even partial ones, can become prohibitively large. This is where BRIN (Block Range INdex) indexes shine. A BRIN index stores the minimum and maximum value for a block of pages (a "block range"), resulting in an index that is orders of magnitude smaller than a B-Tree.
BRIN indexes work best on data with high natural correlation between its physical storage order and its values, like a created_at timestamp in an append-only table.
The problem is that a standard BRIN index on (organization_id, created_at) is not very effective. The organization_id values are likely not well-correlated with the physical storage, so the min/max values in each block range will be very wide, offering poor filtering.
We can combine the power of partial indexes with BRIN to solve this. Let's create a partial BRIN index only for recent, "hot" data.
-- Table with billions of rows
CREATE TABLE device_readings (
id BIGSERIAL PRIMARY KEY,
organization_id UUID NOT NULL,
device_id UUID NOT NULL,
payload JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Create a partial BRIN index for data from the last 90 days
CREATE INDEX idx_device_readings_recent_brin
ON device_readings USING BRIN (organization_id, created_at)
WHERE created_at > NOW() - INTERVAL '90 days';
How It Works:
WHERE Clause: When a user queries for recent data (e.g., a dashboard showing the last 7 days), their query will include a time filter like created_at > NOW() - INTERVAL '7 days'. This predicate satisfies the condition for using our partial index.organization_id is still random, the created_at part of the index is highly correlated. The planner can use the BRIN index to quickly prove that entire block ranges of pages fall outside the query's time window, skipping massive amounts of I/O.Query and Plan Analysis:
Let's apply an RLS policy and query the data for a specific tenant within the last week.
-- RLS Policy is standard
ALTER TABLE device_readings ENABLE ROW LEVEL SECURITY;
CREATE POLICY device_readings_isolation ON device_readings
USING (organization_id = current_setting('app.current_organization_id')::UUID);
-- Query for recent data
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM device_readings
WHERE created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC LIMIT 100;
The Plan without the Partial BRIN Index would likely be a full sequential scan over the billions of rows, filtered by the RLS policy, which would be unacceptably slow.
The Plan with the Partial BRIN Index:
Limit (cost=...)
-> Bitmap Heap Scan on device_readings (cost=...)
Recheck Cond: (created_at > (now() - '7 days'::interval))
Rows Removed by Index Recheck: ...
Filter: (organization_id = current_setting('...')::uuid)
Heap Blocks: lossy=15360
Buffers: shared hit=16020
-> Bitmap Index Scan on idx_device_readings_recent_brin (cost=...)
Index Cond: (created_at > (now() - '7 days'::interval))
Buffers: shared hit=660
Analysis:
* The planner selects our partial BRIN index.
* The Bitmap Index Scan on the tiny BRIN index quickly identifies a candidate set of block ranges (lossy=15360 indicates that many blocks were identified, but many will be filtered out).
* The Bitmap Heap Scan then visits only these candidate blocks, instead of the entire table, and applies the RLS filter.
* This transforms a query that would take minutes or hours into one that completes in seconds or less.
Final Advanced Considerations
Planner Statistics are Paramount: Partial indexes are highly dependent on accurate table statistics. The planner must know the data distribution within the indexed subset*. Frequent and targeted ANALYZE operations on tables with partial indexes are non-negotiable.
* RLS Policy Complexity: Keep your USING clauses simple and immutable. Avoid subqueries or volatile functions (other than current_setting) in the RLS policy itself. The goal is to present the planner with predicates it can directly match against an index's WHERE clause.
* Index Maintenance Overhead: Partial indexes are not free. They still incur maintenance costs on INSERT, UPDATE, and DELETE operations for rows that fall within their predicate. The write cost must be measured against the read performance gain. For a write-heavy table with uniform read patterns, partial indexes may not be the right tool.
* The IMMUTABLE Function Trick: For more complex RLS logic, you can sometimes wrap it in a custom IMMUTABLE function. This tells the planner it can treat the function's output as a constant during planning. This is a powerful but dangerous tool; if the function is not truly immutable, it can lead to incorrect query results. Use with extreme caution and only when the underlying logic is guaranteed not to change within a query.
By moving beyond generic tenant_id indexes and embracing these advanced partial index patterns, you can surgically resolve RLS-induced performance bottlenecks. The key is to deeply understand your data's shape and access patterns, and then build a symbiotic relationship between your schema, your indexes, and your RLS policies to guide the query planner to its most efficient path.