PostgreSQL RLS & Partial Indexes for Multi-Tenant SaaS

12 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Illusion of Security at Scale: When RLS Cripples Your Query Planner

In a multi-tenant SaaS architecture using a shared PostgreSQL database, Row-Level Security (RLS) is the canonical solution for data isolation. It's elegant, robust, and enforces security at the database layer, preventing application-level bugs from leaking data. However, this elegance conceals a performance time bomb. As your tables grow to hundreds of millions or billions of rows across thousands of tenants, you'll notice that queries that should be fast become inexplicably slow, even with well-defined B-tree indexes on your tenant_id or organization_id columns.

The root cause lies in the interaction between RLS and the PostgreSQL query planner. An RLS policy is essentially a USING clause that the database implicitly appends to every query. For the planner, the predicate defined in the policy, especially one using a current_setting() or other stable/volatile function, is often an opaque barrier. The planner knows it must apply this filter, but it struggles to accurately estimate the cardinality of the result set post-filtering. Consequently, it may opt for a costly Sequential Scan or a less-than-optimal Bitmap Index Scan over a large, non-specific index, even when a highly selective index path is available.

This article dissects this specific, advanced problem and provides a production-ready solution: leveraging Partial Indexes to give the query planner the explicit hints it needs to bypass its RLS-induced confusion. We will not cover the basics of RLS or indexing. We assume you are already running into this problem at scale.

Baseline Scenario: The Deceptively Simple Setup

Let's establish a common schema for a project management SaaS. All tables are keyed by organization_id.

sql
-- A simplified schema for our multi-tenant application
CREATE TABLE organizations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL
);

CREATE TABLE projects (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organizations(id),
    name TEXT NOT NULL,
    status TEXT NOT NULL, -- e.g., 'active', 'archived', 'pending'
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Populate with a large dataset. Assume 100M rows, 10,000 tenants.
-- For demonstration, let's create a standard index.
CREATE INDEX idx_projects_organization_id ON projects(organization_id);

-- Enable Row-Level Security on the projects table
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;

-- Define the RLS policy
-- The application is responsible for setting 'app.current_organization_id' for each session.
CREATE POLICY select_projects_by_organization
ON projects
FOR SELECT
USING (organization_id = current_setting('app.current_organization_id')::uuid);

Now, let's simulate an application session for a specific tenant and run a common query.

sql
-- Set the session variable for our current tenant
SET app.current_organization_id = '00000000-0000-0000-0000-000000000001';

-- Analyze the query plan for a simple lookup
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, name, status
FROM projects
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 10;

The Sub-Optimal Plan

With a large projects table, you might expect a fast Index Scan. However, you'll often see a plan that looks something like this, especially if the tenant has a moderate number of projects:

text
Limit  (cost=12345.67..12345.92 rows=10 width=54) (actual time=150.123..150.456 rows=10 loops=1)
  Buffers: shared hit=25000
  ->  Sort  (cost=12345.67..12346.29 rows=250 width=54) (actual time=150.122..150.321 rows=10 loops=1)
        Sort Key: created_at DESC
        Sort Method: top-N heapsort  Memory: 28kB
        Buffers: shared hit=25000
        ->  Bitmap Heap Scan on projects  (cost=123.45..12300.12 rows=250 width=54) (actual time=50.123..145.987 rows=5000 loops=1)
              Recheck Cond: (status = 'active'::text)
              Filter: (organization_id = current_setting('app.current_organization_id')::uuid)
              Rows Removed by Filter: 15000
              Heap Blocks: exact=20000
              Buffers: shared hit=25000
              ->  Bitmap Index Scan on idx_projects_status  (cost=0.00..123.00 rows=20000 width=0) (actual time=45.012..45.012 rows=20000 loops=1)
                    Index Cond: (status = 'active'::text)
                    Buffers: shared hit=5000
Planning Time: 0.521 ms
Execution Time: 151.001 ms

What happened here? The planner chose to use an index on status (assuming one exists), fetching all active projects for all tenants. It then performed a Bitmap Heap Scan, visiting each of those rows in the table to apply the RLS Filter for our specific organization_id. This is horrifically inefficient. The planner has to read thousands of blocks from disk/cache only to discard most of the rows. The standard idx_projects_organization_id index is ignored for the initial filtering because the planner prioritizes the WHERE status = 'active' clause and doesn't know how to effectively combine it with the RLS policy.

Adding a composite index (organization_id, status) might help, but the fundamental problem remains: the index contains data for all tenants, and the RLS predicate is an afterthought for the planner.

The Solution: Partial Indexes for Surgical Precision

A partial index is an index built over a subset of a table's rows, defined by a WHERE clause. The critical limitation is that this WHERE clause cannot contain volatile or stable functions like current_setting(). This seems like a dealbreaker, but it forces us to adopt a more pragmatic, business-aware optimization strategy.

You cannot create a partial index for every tenant. The solution is to create them strategically.

Pattern 1: The 'Hot Tenant' Index

In any multi-tenant system, a small number of tenants often generate a disproportionate amount of load (the Pareto principle). Your largest customers, or your most active ones, are 'hot tenants'. Their performance experience is critical. We can create partial indexes specifically for them.

First, you need to identify these tenants. A query against pg_stat_statements is a good starting point:

sql
-- NOTE: Requires pg_stat_statements to be enabled and configured.
SELECT 
    (regexp_matches(query, 'organization_id = ''([0-9a-f-]+)'''))[1] AS organization_id,
    SUM(total_exec_time) AS total_time,
    SUM(calls) AS total_calls,
    SUM(rows) AS total_rows
FROM pg_stat_statements
WHERE query ILIKE '%FROM projects%'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

Let's assume the analysis identifies tenant '00000000-0000-0000-0000-000000000001' as a hot tenant. We can now create a set of hyper-optimized indexes just for them.

sql
-- Create a partial, composite index for a specific hot tenant's common query pattern.
CREATE INDEX idx_projects_hot_tenant_0001_active_by_date
ON projects (created_at DESC)
WHERE organization_id = '00000000-0000-0000-0000-000000000001' AND status = 'active';

This index is a masterpiece of specificity:

  • It only contains entries for a single organization_id.
  • It only contains entries where status is 'active'.
  • The data is pre-sorted by created_at DESC.
  • This index will be minuscule compared to a full table index. Now, let's re-run our EXPLAIN ANALYZE for this specific tenant.

    sql
    SET app.current_organization_id = '00000000-0000-0000-0000-000000000001';
    
    EXPLAIN (ANALYZE, BUFFERS)
    SELECT id, name, status
    FROM projects
    WHERE status = 'active'
    ORDER BY created_at DESC
    LIMIT 10;

    The Optimized Plan

    The query planner now has a perfect path. The WHERE clause of the query is a superset of the index's WHERE clause. The RLS policy predicate (organization_id = '...') also matches the index's predicate.

    text
    Limit  (cost=0.28..2.55 rows=10 width=54) (actual time=0.035..0.045 rows=10 loops=1)
      Buffers: shared hit=4
      ->  Index Scan using idx_projects_hot_tenant_0001_active_by_date on projects  (cost=0.28..51.28 rows=225 width=54) (actual time=0.034..0.042 rows=10 loops=1)
            Buffers: shared hit=4
    Planning Time: 0.189 ms
    Execution Time: 0.063 ms

    The difference is staggering:

    * Execution Time: 151.001 ms -> 0.063 ms (over 2000x faster).

    * Buffers Hit: 25000 -> 4 (a tiny fraction of the I/O).

    * Plan: Bitmap Heap Scan -> Index Scan. The planner goes directly to the tiny, pre-sorted index, reads the first 10 entries, and is done. It doesn't need to touch the heap for any rows that don't satisfy the condition. The RLS policy is still checked, but it's effectively a no-op since the index itself guarantees the organization_id matches.

    Pattern 2: Indexing Common Application States

    This strategy isn't limited to tenant IDs. You can apply it to any column with low cardinality that represents a common filter in your application, especially when combined with RLS.

    Consider a scenario where users frequently view their active projects. The archived projects are rarely accessed. A column like archived_at TIMESTAMPTZ NULL is a prime candidate.

    sql
    -- Most projects are active, so archived_at is mostly NULL.
    -- Let's create an index that covers ALL active projects for ALL tenants.
    CREATE INDEX idx_projects_active_by_org_and_date 
    ON projects (organization_id, created_at DESC)
    WHERE archived_at IS NULL;

    This index is still large, but it's significantly smaller than a full-table index because it omits all archived projects. Now, consider a query for a non-hot tenant (one without a specific partial index).

    sql
    SET app.current_organization_id = '00000000-0000-0000-0000-000000000002'; -- A different tenant
    
    EXPLAIN (ANALYZE, BUFFERS)
    SELECT id, name
    FROM projects
    WHERE archived_at IS NULL -- This is the key filter
    ORDER BY created_at DESC
    LIMIT 20;

    Even though the RLS policy is still in effect, the planner is now much smarter. It can use idx_projects_active_by_org_and_date because:

  • The WHERE archived_at IS NULL clause in the query directly matches the index's predicate.
  • The leading column of the index is organization_id, which is exactly what the RLS policy filters on.
  • The planner can perform an efficient Index Scan on this smaller index, seeking directly to the organization_id of the current tenant and then reading the data in the already-sorted created_at order.

    Production Implementation and Operational Overhead

    This is not a 'set it and forget it' solution. It requires an operational strategy.

  • Monitoring is Non-Negotiable: Use pg_stat_statements and other APM tools to continuously identify hot tenants and slow query patterns. This is the data that drives your indexing decisions.
  • Automate Index Management: Create a semi-automated process. A script can run weekly, identify the top N tenants by query time, and generate the CREATE INDEX statements. These should be reviewed by a DBA or senior engineer and applied during a low-traffic maintenance window. Use CREATE INDEX CONCURRENTLY to avoid locking the table.
  • Define a 'Cooling' Strategy: A tenant that is hot today might not be hot in three months. You need a process to drop partial indexes that are no longer providing value. The pg_stat_user_indexes view can show you which indexes are being used. If a hot-tenant index has a low idx_scan count over a long period, it's a candidate for removal.
  • Factor in Write Penalty: Every index adds overhead to INSERT, UPDATE, and DELETE operations. The beauty of partial indexes is that this penalty is also partial. An UPDATE to a row that does not match the index's WHERE clause incurs no write penalty on that index. This makes them particularly efficient for immutable or rarely-changing data subsets.
  • Benchmarking the Impact

    To quantify the difference, let's simulate a workload with pgbench.

    Setup:

    * projects table with 50M rows.

    * 10,000 tenants.

    * Tenant distribution is skewed: 1 tenant has 5M rows, 10 tenants have 500k rows each, and the rest are long-tail.

    Test Script (query.sql for pgbench):

    sql
    -- Executed for a 'hot' tenant
    \set org_id '''00000000-0000-0000-0000-000000000001'''
    
    BEGIN;
    SET app.current_organization_id = :org_id;
    SELECT id, name, status FROM projects WHERE status = 'active' ORDER BY created_at DESC LIMIT 10;
    END;

    Results (Transactions Per Second - Higher is Better):

    Indexing StrategyTPS (Read-Only Workload)Average Latency (ms)
    1. Standard Index on (organization_id)4522.22
    2. Composite Index on (organization_id, status, created_at)1506.67
    3. Partial Index for Hot Tenant (as defined above)21000.47

    The results speak for themselves. The move from a general-purpose composite index to a surgical partial index yields a >14x improvement in throughput for the targeted workload. For the end-user, this is the difference between an application that feels sluggish and one that feels instantaneous.

    Final Considerations and Caveats

    * Planner Statistics: After creating or dropping indexes, always run ANALYZE on the table. The planner needs up-to-date statistics to make intelligent choices.

    * Generic Plans: Be wary of plan caching in application layers or connection poolers (like PgBouncer in transaction mode). A cached plan for one tenant might be sub-optimal for another. The partial index strategy helps mitigate this, as the planner is more likely to pick the correct, specific index when one exists.

    * This is a Scaling Technique: Do not implement this prematurely. This pattern is for when you have a large dataset and are demonstrably bottlenecked by RLS-related query performance. For smaller applications, a good composite index is usually sufficient.

    By combining the robust security of Row-Level Security with the surgical performance of partial indexes, you can build multi-tenant PostgreSQL-backed systems that scale gracefully without compromising on data isolation. It requires a shift from a purely declarative indexing model to a more active, operationally-aware strategy, but for high-performance SaaS, the payoff is immense.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles