High-Performance Multi-Tenancy with PostgreSQL RLS and Partial Indexes

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Multi-Tenancy Conundrum: Security vs. Performance

In modern SaaS architecture, multi-tenancy is the default. The shared-schema, single-database model—where a tenant_id column discriminates data—is popular for its operational simplicity and cost-effectiveness. The canonical tool for enforcing data isolation in this model within PostgreSQL is Row-Level Security (RLS). It provides a robust, transparent, and application-agnostic layer of security, ensuring that tenant_A can never, under any circumstances, see data belonging to tenant_B.

However, what is often a footnote in introductory tutorials becomes a primary operational crisis at scale: RLS can introduce significant, non-obvious performance degradation. Senior engineers who have deployed RLS in high-throughput environments have likely encountered queries that, despite seemingly perfect indexing on tenant_id, result in slow sequential scans or inefficient bitmap index scans. The root cause lies in the complex interplay between the RLS policy predicate and the PostgreSQL query planner.

This article dissects this exact problem. We will bypass introductory explanations of RLS and indexing, assuming you are already familiar with them. Instead, we will focus on a production-proven, advanced pattern that marries RLS with a dynamic partial index strategy to achieve both ironclad security and high performance, even with highly skewed tenant data distributions.

The Setup: A Typical SaaS Schema

Let's establish a baseline schema representing a project management SaaS. We have tenants, users who belong to those tenants, and projects that also belong to tenants.

sql
-- Enable UUIDs for primary keys
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

-- Tenants table
CREATE TABLE tenants (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    name TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Users table
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
    email TEXT NOT NULL UNIQUE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- A large table where performance matters
CREATE TABLE projects (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
    name TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'active',
    due_date DATE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Standard B-tree index on the foreign key
CREATE INDEX idx_projects_tenant_id ON projects(tenant_id);
CREATE INDEX idx_users_tenant_id ON users(tenant_id);

This is a standard, normalized schema. The idx_projects_tenant_id index is what most developers would add immediately, assuming it will solve all performance issues related to tenant-specific queries.

The Naive RLS Implementation: A Performance Trap

Now, let's implement RLS. The goal is to ensure that any query executed by an application user can only access data associated with their tenant_id. We'll use a runtime parameter, app.current_tenant_id, to securely pass the tenant context to PostgreSQL.

sql
-- Enable RLS on the projects table
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;

-- Policy: Users can only see projects within their tenant
CREATE POLICY projects_isolation_policy ON projects
FOR ALL
USING (tenant_id = current_setting('app.current_tenant_id')::uuid)
WITH CHECK (tenant_id = current_setting('app.current_tenant_id')::uuid);

This policy is simple and effective from a security standpoint. When an application connects to the database for a specific tenant, it first executes SET app.current_tenant_id = '...'. Every subsequent query on the projects table will have the USING clause implicitly appended as a WHERE condition.

Let's populate the database with a skewed data distribution, which is highly realistic in a SaaS environment. We'll create one "hot tenant" with millions of records and many "small tenants" with only a few thousand.

sql
-- Generate sample data
-- (This is a simplified version; in a real test, use a script)

-- Create a hot tenant and a small tenant
INSERT INTO tenants (id, name) VALUES ('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 'Hot Tenant Inc.');
INSERT INTO tenants (id, name) VALUES ('b1eebc99-9c0b-4ef8-bb6d-6bb9bd380a12', 'Small Tenant LLC');

-- Populate with a large number of projects for the hot tenant
INSERT INTO projects (tenant_id, name, status)
SELECT 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 'Project ' || g.i, 'active'
FROM generate_series(1, 5000000) AS g(i);

-- Populate with a small number of projects for the small tenant
INSERT INTO projects (tenant_id, name, status)
SELECT 'b1eebc99-9c0b-4ef8-bb6d-6bb9bd380a12', 'Project ' || g.i, 'active'
FROM generate_series(1, 1000) AS g(i);

-- Analyze the table to update statistics
ANALYZE projects;

Now, let's simulate a query for the small tenant and inspect the query plan.

sql
-- Simulate application session for the small tenant
SET app.current_tenant_id = 'b1eebc99-9c0b-4ef8-bb6d-6bb9bd380a12';

-- Run a typical query
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM projects WHERE status = 'active' ORDER BY created_at DESC LIMIT 10;

Here is a likely query plan you would see:

text
Limit  (cost=139851.32..139851.34 rows=10 width=73) (actual time=235.484..235.487 rows=10 loops=1)
  Buffers: shared hit=16401
  ->  Sort  (cost=139851.32..140488.82 rows=255000 width=73) (actual time=235.482..235.484 rows=10 loops=1)
        Sort Key: created_at DESC
        Sort Method: top-N heapsort  Memory: 26kB
        Buffers: shared hit=16401
        ->  Bitmap Heap Scan on projects  (cost=5789.29..138576.32 rows=255000 width=73) (actual time=34.918..234.908 rows=1000 loops=1)
              Recheck Cond: (status = 'active'::text)
              Filter: (tenant_id = 'b1eebc99-9c0b-4ef8-bb6d-6bb9bd380a12'::uuid)
              Rows Removed by Filter: 4999000
              Heap Blocks: exact=16384
              Buffers: shared hit=16401
              ->  Bitmap Index Scan on idx_projects_status  (cost=0.00..5725.54 rows=5000000 width=0) (actual time=33.689..33.689 rows=5000000 loops=1)
                    Index Cond: (status = 'active'::text)
                    Buffers: shared hit=17
Planning Time: 0.215 ms
Execution Time: 235.549 ms

(Note: Your exact plan may vary. Let's assume an index idx_projects_status exists for this example.)

Analysis of the Bad Plan:

  • High Cost & Row Estimate: The planner estimates it will need to sort 255,000 rows, a wildly inaccurate number. It derived this from general table statistics, not tenant-specific ones.
  • Filter, Not Index Condition: The crucial line is Filter: (tenant_id = '...'::uuid). The tenant_id check is happening as a filter on the rows returned by the Bitmap Heap Scan, not as part of the initial index lookup.
  • Rows Removed by Filter: It found 5 million active projects and then had to discard 4,999,000 of them because they didn't match the tenant_id from the RLS policy. This is incredibly inefficient.
  • Execution Time: Over 200ms for a query that should be instantaneous, as it's only fetching 10 rows from a tenant with 1000 total records.
  • Why does this happen? The PostgreSQL planner, for safety and generality, often treats the RLS policy predicate as an opaque condition to be applied after an initial plan is formed. It sees status = 'active' and decides the best plan is to use an index on status. It doesn't effectively combine the RLS predicate (tenant_id = ...) with the explicit query WHERE clause to choose the much more selective idx_projects_tenant_id. The presence of the "hot tenant" has poisoned the table statistics, making the planner believe that filtering by tenant_id is not very selective.

    The Solution: Partial Indexes Aligned with RLS

    A partial index is an index built on a subset of a table's rows, defined by a WHERE clause. The magic happens when the WHERE clause of a query exactly matches the WHERE clause of a partial index. When this alignment occurs, the planner knows it can use this much smaller, highly specific index.

    The core idea of our advanced pattern is this: Instead of one large index on tenant_id, we will create a separate index for each tenant.

    At first, this sounds unmanageable and potentially insane. But let's see it in action first, and then we'll address the management aspect.

    Let's create a partial index specifically for our 'Small Tenant LLC'.

    sql
    -- Create a composite partial index for the small tenant's active projects
    CREATE INDEX idx_projects_small_tenant_b1_active_sorted 
    ON projects (created_at DESC) 
    WHERE tenant_id = 'b1eebc99-9c0b-4ef8-bb6d-6bb9bd380a12' AND status = 'active';

    This index is tiny. It contains entries only for the 1,000 active projects belonging to this specific tenant. Now, let's run the exact same EXPLAIN ANALYZE command again, with the same app.current_tenant_id set.

    sql
    EXPLAIN (ANALYZE, BUFFERS) 
    SELECT * FROM projects WHERE status = 'active' ORDER BY created_at DESC LIMIT 10;

    The new query plan will be dramatically different and vastly superior:

    text
    Limit  (cost=0.28..1.24 rows=10 width=73) (actual time=0.031..0.037 rows=10 loops=1)
      Buffers: shared hit=4
      ->  Index Scan using idx_projects_small_tenant_b1_active_sorted on projects  (cost=0.28..96.68 rows=1000 width=73) (actual time=0.029..0.034 rows=10 loops=1)
            Buffers: shared hit=4
    Planning Time: 0.358 ms
    Execution Time: 0.059 ms

    Analysis of the Good Plan:

  • Planner Perfection: The planner immediately identified idx_projects_small_tenant_b1_active_sorted. The RLS policy's implicit WHERE tenant_id = 'b1eebc99-9c0b-4ef8-bb6d-6bb9bd380a12' combined with the query's WHERE status = 'active' perfectly matched the partial index's definition.
  • Index Scan: It's using a highly efficient Index Scan. Because the index is already sorted by created_at DESC, it doesn't even need to perform a sort operation. It simply walks the index and picks off the first 10 entries.
  • Cost and Row Estimates: The cost is minuscule (0.28..1.24), and the row estimate (1000) is now perfectly accurate for this tenant.
  • Execution Time: The query finished in 0.059 ms, a ~4000x improvement over the previous 235 ms. The number of buffers hit dropped from over 16,000 to just 4.
  • This demonstrates true performance isolation. The query performance for the small tenant is now completely independent of the 5 million rows belonging to the hot tenant.

    Production Implementation: Dynamic Index Management

    Creating indexes manually for each tenant is not a viable strategy. The solution is to automate this process using database triggers and functions.

    We will create a function that, given a tenant_id, generates and executes the CREATE INDEX statements for all performance-critical queries. This function will then be called by a trigger whenever a new tenant is created.

    sql
    -- A function to create all necessary partial indexes for a given tenant
    CREATE OR REPLACE FUNCTION create_partial_indexes_for_tenant(p_tenant_id UUID)
    RETURNS void AS $$
    DECLARE
        index_name TEXT;
        table_name TEXT := 'projects';
    BEGIN
        -- Example 1: Index for fetching active projects, sorted by creation date
        -- Matches query: SELECT ... FROM projects WHERE status = 'active' ORDER BY created_at DESC
        index_name := table_name || '_' || replace(p_tenant_id::text, '-', '_') || '_active_created_at_desc';
        IF NOT EXISTS (SELECT 1 FROM pg_class WHERE relname = index_name) THEN
            RAISE NOTICE 'Creating index: %', index_name;
            EXECUTE format('CREATE INDEX %I ON %I (created_at DESC) WHERE tenant_id = %L AND status = %L',
                index_name, table_name, p_tenant_id, 'active');
        END IF;
    
        -- Example 2: Index for searching projects by name (case-insensitive)
        -- Matches query: SELECT ... FROM projects WHERE lower(name) LIKE 'search_term%'
        index_name := table_name || '_' || replace(p_tenant_id::text, '-', '_') || '_name_lower_textops';
        IF NOT EXISTS (SELECT 1 FROM pg_class WHERE relname = index_name) THEN
            RAISE NOTICE 'Creating index: %', index_name;
            EXECUTE format('CREATE INDEX %I ON %I (lower(name) text_pattern_ops) WHERE tenant_id = %L',
                index_name, table_name, p_tenant_id);
        END IF;
    
        -- Add more index creation statements here for other critical queries
    
    END;
    $$ LANGUAGE plpgsql;
    
    -- A trigger that fires after a new tenant is inserted
    CREATE OR REPLACE FUNCTION trg_after_insert_tenant()
    RETURNS TRIGGER AS $$
    BEGIN
        -- Call the index creation function for the new tenant
        PERFORM create_partial_indexes_for_tenant(NEW.id);
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    -- Attach the trigger to the tenants table
    CREATE TRIGGER after_insert_tenant_trigger
    AFTER INSERT ON tenants
    FOR EACH ROW
    EXECUTE FUNCTION trg_after_insert_tenant();

    With this automation in place, every time you run INSERT INTO tenants (name) VALUES ('New Customer');, the necessary high-performance indexes are created for them automatically. The application code remains completely unaware of this underlying optimization.

    Similarly, you need a cleanup process for when a tenant is removed.

    sql
    -- A function to drop all partial indexes for a given tenant
    CREATE OR REPLACE FUNCTION drop_partial_indexes_for_tenant(p_tenant_id UUID)
    RETURNS void AS $$
    DECLARE
        r RECORD;
    BEGIN
        -- Find all indexes on the 'projects' table that reference this tenant_id in their definition
        FOR r IN 
            SELECT c.relname AS index_name
            FROM pg_class c
            JOIN pg_namespace n ON n.oid = c.relnamespace
            JOIN pg_index i ON i.indexrelid = c.oid
            WHERE c.relkind = 'i'
              AND i.indrelid = 'projects'::regclass
              AND pg_get_expr(i.indpred, i.indrelid) LIKE '%' || p_tenant_id::text || '%'
        LOOP
            RAISE NOTICE 'Dropping index: %', r.index_name;
            EXECUTE format('DROP INDEX IF EXISTS %I', r.index_name);
        END LOOP;
    
        -- Add loops for other tables here
    
    END;
    $$ LANGUAGE plpgsql;
    
    -- You can call this manually before deleting a tenant or use a trigger
    -- Example manual call:
    -- SELECT drop_partial_indexes_for_tenant('tenant-uuid-to-delete');
    -- DELETE FROM tenants WHERE id = 'tenant-uuid-to-delete';

    Advanced Scenarios and Edge Cases

    This pattern is powerful but introduces its own set of complexities that senior engineers must consider.

    1. The "Many Small Tenants" Problem

    What happens if you have 100,000 tenants? Creating multiple indexes per tenant could lead to millions of indexes. This can bloat the pg_class system catalog and potentially slow down the query planner itself, as it has more indexes to evaluate for each query.

    Solution: A Hybrid Approach. This strategy provides the most benefit for medium-to-large tenants. For the long tail of very small or inactive tenants, the overhead of dedicated indexes may not be worthwhile. You can implement a hybrid model:

    * Tier-based Indexing: Only create partial indexes for tenants on a "Premium" or "Enterprise" plan.

    * Activity-based Indexing: Run a background job that identifies tenants with high query volume or data size and creates partial indexes for them retroactively. Small tenants can continue to use the global idx_projects_tenant_id index. While not perfectly performant for them, the impact of their small size makes it acceptable.

    2. Cross-Tenant Reporting and Super-Admin Access

    RLS is designed to prevent cross-tenant access. But internal analytics or super-admin tools often require it.

    Solution: Dedicated Reporting Roles. Create a specific database role for these purposes and grant it the BYPASSRLS attribute.

    sql
    CREATE ROLE reporting_user LOGIN PASSWORD '...';
    ALTER ROLE reporting_user SET BYPASSRLS = ON;

    When your application connects as reporting_user, RLS policies will not apply. This is a powerful and dangerous privilege that must be strictly controlled. Connections using this role should only be available to trusted internal services, never to the general application backend that serves user traffic.

    3. Schema Migrations and Index Management

    When you add a new column to projects and want to include it in a partial index, you now have a new challenge. You can't just run a single ALTER INDEX command.

    Solution: Versioned Index Management. Your create_partial_indexes_for_tenant function should be versioned. When you need to change the index structure, you would:

  • Create a new function, e.g., create_partial_indexes_for_tenant_v2.
  • Update the trigger on the tenants table to call this new function.
  • Write a one-off migration script that iterates through all existing tenants and calls the v2 function for them. This script can also drop the old, v1 indexes.
  • This requires disciplined DevOps and migration practices.

    4. The `CREATE INDEX ... CONCURRENTLY` Consideration

    The CREATE INDEX command takes a lock that can block writes to the table. For a new tenant with no data, this is instantaneous. But if you are retroactively creating an index for an existing, active tenant, this is a problem.

    Solution: Your background jobs or migration scripts that create indexes for existing tenants must use CREATE INDEX CONCURRENTLY. This takes longer and consumes more resources but does not block writes. Note that CREATE INDEX CONCURRENTLY cannot be run inside a transaction block, so your management functions need to be designed accordingly.

    Final Considerations

    The pattern of combining Row-Level Security with dynamically managed partial indexes is a powerful solution to a difficult problem in scalable multi-tenant application design. It moves beyond generic best practices to address the specific, nuanced behavior of the PostgreSQL query planner under RLS.

    By creating indexes that perfectly align with the RLS policy predicates for each tenant, you achieve true performance isolation. The query performance for a tenant with 1,000 rows is unaffected by another tenant with 100 million rows in the same table. This prevents the "hot tenant" or "noisy neighbor" problem at the database level.

    This is not a zero-cost abstraction. It requires a mature approach to database management, including automation for index lifecycle, careful handling of schema migrations, and a strategy for managing the potential proliferation of indexes. However, for SaaS platforms where performance and security are paramount, the investment is often justified by the dramatic gains in query speed and system stability.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles