Scaling Multi-Tenant RLS in PostgreSQL with Partial Indexes

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Performance Cliff of Multi-Tenant RLS at Scale

In a mature multi-tenant SaaS application running on PostgreSQL, Row-Level Security (RLS) is the canonical solution for enforcing data isolation. It's declarative, robust, and lives within the database, ensuring that no matter how a user connects, they can only see their own tenant's data. The typical implementation involves a tenant_id column on every relevant table and a policy that filters rows based on a session variable, often set using SET a.tenant_id = '...'.

The problem arises not from the security model itself, but from the query performance under load. A naive but common approach is to place a standard B-tree index on the tenant_id column:

sql
CREATE INDEX idx_orders_tenant_id ON orders (tenant_id);

On a table with billions of rows spread across tens of thousands of tenants, this index becomes a performance liability. For any single tenant's query, the selectivity is extremely low. If a tenant has 100,000 orders in a 2-billion-row table, the planner sees that the tenant_id index points to a significant fraction of the table's total pages. This often leads PostgreSQL to favor a Bitmap Heap Scan over a more direct Index Scan. While better than a full sequential scan, it's far from optimal and scales poorly as the table grows.

The RLS policy application adds another layer of complexity. The planner must incorporate the policy's WHERE clause into its calculations, and its ability to efficiently use a large, low-selectivity global index is limited. The result is predictable query latency that creeps up until it becomes a critical production issue.

This article presents a solution: abandoning the single global tenant_id index in favor of a highly-targeted, per-tenant partial index strategy. This pattern dramatically improves query performance for individual tenants by creating small, dense, and highly-selective indexes that the query planner can use with extreme efficiency.


Baseline Scenario: Demonstrating the Problem

Let's establish a concrete schema and data model to analyze. Assume we have a system managing orders for multiple tenants.

Schema:

sql
-- Tenant management table
CREATE TABLE tenants (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Orders table with a foreign key to tenants
CREATE TABLE orders (
  id BIGSERIAL PRIMARY KEY,
  tenant_id UUID NOT NULL REFERENCES tenants(id),
  customer_id BIGINT NOT NULL,
  order_date TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  amount NUMERIC(10, 2) NOT NULL,
  status TEXT NOT NULL DEFAULT 'pending'
);

-- A standard, global index on tenant_id
CREATE INDEX idx_orders_tenant_id ON orders (tenant_id);

-- An index for a common query pattern
CREATE INDEX idx_orders_tenant_date ON orders (tenant_id, order_date DESC);

Data Population (Simulated):

Let's populate this with 1,000 tenants and 100 million orders, giving each tenant roughly 100,000 orders.

sql
-- This is a conceptual script; running it would take time.
-- INSERT INTO tenants (id, name) SELECT gen_random_uuid(), 'Tenant ' || i FROM generate_series(1, 1000) i;
-- INSERT INTO orders (tenant_id, customer_id, order_date, amount, status)
-- SELECT
--   t.id,
--   (random() * 10000)::int,
--   NOW() - (random() * 365)::int * '1 day'::interval,
--   (random() * 500)::numeric(10,2),
--   'completed'
-- FROM tenants t, generate_series(1, 100000);

RLS Implementation:

sql
-- Enable RLS on the orders table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Create a policy that checks a session variable
CREATE POLICY tenant_isolation_policy ON orders
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID)
WITH CHECK (tenant_id = current_setting('app.tenant_id')::UUID);

Now, let's analyze a typical query to fetch recent orders for a specific tenant.

sql
-- Set the session context for our target tenant
-- Let's assume this tenant's UUID is 'a1b2c3d4-...' 
SET app.tenant_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d';

-- Analyze the query plan
EXPLAIN ANALYZE
SELECT id, order_date, amount, status
FROM orders
WHERE order_date > NOW() - '30 days'::interval
ORDER BY order_date DESC
LIMIT 100;

Expected EXPLAIN ANALYZE Output (with global index):

text
Limit  (cost=12345.67..12346.92 rows=100 width=34) (actual time=150.123..150.456 rows=100 loops=1)
  ->  Sort  (cost=12345.67..12389.12 rows=17380 width=34) (actual time=150.120..150.300 rows=100 loops=1)
        Sort Key: order_date DESC
        Sort Method: top-N heapsort  Memory: 35kB
        ->  Bitmap Heap Scan on orders  (cost=456.78..12100.23 rows=17380 width=34) (actual time=10.123..140.987 rows=8345 loops=1)
              Recheck Cond: (tenant_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d'::uuid AND order_date > (now() - '30 days'::interval))
              Heap Blocks: exact=7980
              ->  Bitmap Index Scan on idx_orders_tenant_date  (cost=0.00..452.43 rows=17380 width=0) (actual time=8.123..8.123 rows=8345 loops=1)
                    Index Cond: (tenant_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d'::uuid AND order_date > (now() - '30 days'::interval))
Planning Time: 0.543 ms
Execution Time: 151.012 ms

The key takeaway here is the Bitmap Heap Scan. The planner uses the idx_orders_tenant_date index to create a bitmap of pages to visit, then visits each of those pages (Heap Blocks: exact=7980) to fetch the rows and re-check the condition. While the index scan itself is fast (8ms), the subsequent heap scan is slow (130ms) because the tenant's data is scattered across thousands of pages. The total execution time of ~150ms is unacceptable for a primary application query.


The Partial Index Solution

A partial index is an index built over a subset of a table's rows, defined by a WHERE clause. The core idea of our pattern is to create a dedicated index for each tenant.

Let's create a partial index for our specific tenant and the same query pattern.

sql
-- Create a specific index just for our target tenant
CREATE INDEX idx_orders_tenant_a1b2c3d4_date 
ON orders (order_date DESC) 
WHERE tenant_id = 'a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d';

Notice two crucial differences from the composite index:

  • The tenant_id is now in the WHERE clause, not the index key itself.
  • The index key is just (order_date DESC), which perfectly matches our ORDER BY clause.
  • This index is tiny compared to the global one. It contains entries only for this single tenant. Now, let's re-run the EXPLAIN ANALYZE.

    sql
    -- Ensure we are using the new index (Postgres planner is smart, but let's be sure)
    SET enable_bitmapscan = off; -- For demonstration purposes
    
    EXPLAIN ANALYZE
    SELECT id, order_date, amount, status
    FROM orders
    WHERE order_date > NOW() - '30 days'::interval
    ORDER BY order_date DESC
    LIMIT 100;
    
    SET enable_bitmapscan = on; -- Reset

    Expected EXPLAIN ANALYZE Output (with partial index):

    text
    Limit  (cost=0.43..12.34 rows=100 width=34) (actual time=0.045..0.567 rows=100 loops=1)
      ->  Index Scan using idx_orders_tenant_a1b2c3d4_date on orders  (cost=0.43..1012.45 rows=8345 width=34) (actual time=0.043..0.432 rows=100 loops=1)
            Index Cond: (order_date > (now() - '30 days'::interval))
    Planning Time: 0.210 ms
    Execution Time: 0.615 ms

    The difference is staggering. The execution time has dropped from ~150ms to ~0.6ms—a >200x improvement.

    Why is it so much faster?

  • Index Scan: The planner immediately selects the highly specific partial index. Because the RLS policy injects WHERE tenant_id = 'a1b2c3d4-...', it perfectly matches the index's WHERE clause.
  • No Heap Scan: The planner performs an Index Scan. Because the index key is order_date DESC and the query has ORDER BY order_date DESC, the planner can simply walk the index in order and stop after finding 100 rows. It doesn't need to fetch all 8,345 matching rows and sort them in memory.
  • Index Size & Caching: The partial index for one tenant is extremely small and will almost certainly reside entirely in memory, making access near-instantaneous.

  • Productionizing the Pattern: Automated Index Management

    Manually creating an index for every tenant is not a viable strategy. We need an automated, robust system for managing the lifecycle of these partial indexes.

    A naive approach might use a trigger on the tenants table. However, CREATE INDEX can take an exclusive lock on the table, blocking writes. Running this inside the tenant creation transaction is a recipe for deadlocks and timeouts. The correct production pattern uses an asynchronous worker.

    The Asynchronous Worker Pattern

  • Tenant Creation: When a new tenant is created, the application logic (or a database trigger) sends a message to a queue (e.g., RabbitMQ, SQS, or even PostgreSQL's own NOTIFY).
  • Indexing Service: A separate, long-running background service listens to this queue.
  • Concurrent Index Creation: Upon receiving a new_tenant message, the service connects to the database and executes CREATE INDEX CONCURRENTLY .... The CONCURRENTLY keyword is critical—it builds the index without taking locks that would block writes to the table.
  • Step 1: Notifying the Worker

    We can use a simple trigger with pg_notify for a purely PostgreSQL-based solution. This avoids external message queue dependencies for simpler setups.

    sql
    -- A function to send a notification on new tenant creation
    CREATE OR REPLACE FUNCTION notify_new_tenant() 
    RETURNS TRIGGER AS $$
    BEGIN
      PERFORM pg_notify('new_tenants', NEW.id::text);
      RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    -- A trigger on the tenants table
    CREATE TRIGGER new_tenant_trigger
    AFTER INSERT ON tenants
    FOR EACH ROW
    EXECUTE FUNCTION notify_new_tenant();

    Step 2: The Indexing Worker (Python Example)

    This Python service would run continuously, listening for notifications.

    python
    import psycopg2
    import psycopg2.extensions
    import select
    import time
    import logging
    
    # Configure logging
    logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
    
    DB_CONN_STRING = "dbname='yourdb' user='youruser' password='yourpass' host='localhost'"
    
    # Define the indexes you want to create for each tenant.
    # Using a template makes it easy to manage multiple partial indexes.
    INDEX_DEFINITIONS = [
        {
            "name_template": "idx_orders_tenant_{tenant_id_safe}_date",
            "sql_template": """
                CREATE INDEX CONCURRENTLY IF NOT EXISTS {index_name}
                ON orders (order_date DESC)
                WHERE tenant_id = '{tenant_id}';
            """
        },
        {
            "name_template": "idx_orders_tenant_{tenant_id_safe}_status_date",
            "sql_template": """
                CREATE INDEX CONCURRENTLY IF NOT EXISTS {index_name}
                ON orders (status, order_date DESC)
                WHERE tenant_id = '{tenant_id}';
            """
        }
    ]
    
    def create_partial_indexes_for_tenant(tenant_id):
        """Creates all defined partial indexes for a given tenant_id."""
        logging.info(f"Processing request to create indexes for tenant: {tenant_id}")
        
        # Sanitize tenant_id for use in index names (replace hyphens)
        tenant_id_safe = tenant_id.replace('-', '')
    
        try:
            with psycopg2.connect(DB_CONN_STRING) as conn:
                conn.autocommit = True # Required for CREATE INDEX CONCURRENTLY outside a transaction block
                with conn.cursor() as cur:
                    for index_def in INDEX_DEFINITIONS:
                        index_name = index_def["name_template"].format(tenant_id_safe=tenant_id_safe)
                        sql = index_def["sql_template"].format(index_name=index_name, tenant_id=tenant_id)
                        
                        logging.info(f"Executing: {sql}")
                        start_time = time.time()
                        cur.execute(sql)
                        end_time = time.time()
                        logging.info(f"Index '{index_name}' created or already exists. Took {end_time - start_time:.2f}s.")
    
        except psycopg2.Error as e:
            logging.error(f"Database error while creating indexes for tenant {tenant_id}: {e}")
    
    def listen_for_tenants():
        """Main listener loop."""
        conn = psycopg2.connect(DB_CONN_STRING)
        conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
    
        curs = conn.cursor()
        curs.execute("LISTEN new_tenants;")
        logging.info("Worker started. Listening for new_tenants notifications...")
    
        while True:
            if select.select([conn], [], [], 60) == ([], [], []):
                logging.info("Timeout, no new notifications. Pinging connection.")
                curs.execute('SELECT 1;') # Keep a heartbeat
            else:
                conn.poll()
                while conn.notifies:
                    notify = conn.notifies.pop(0)
                    logging.info(f"Received notification: {notify.channel}, payload: {notify.payload}")
                    create_partial_indexes_for_tenant(notify.payload)
    
    if __name__ == "__main__":
        listen_for_tenants()
    

    This worker is robust: it uses CREATE INDEX CONCURRENTLY IF NOT EXISTS to be idempotent and non-blocking. It's template-driven, allowing you to easily add new partial index definitions as your application's query patterns evolve.


    Advanced Considerations and Edge Cases

    Implementing this pattern in a large-scale production environment requires considering several edge cases.

    1. Index Proliferation and Planner Overhead

    With 50,000 tenants and 2 partial indexes per tenant, you now have 100,000 indexes on the orders table. How does this affect PostgreSQL?

    * pg_class and System Catalogs: Each index is a row in system catalogs like pg_class and pg_index. This increases the size of these catalogs, which can slightly slow down planning time for all queries against the table, as the planner has more potential indexes to evaluate.

    * Planner Cost: For a given query, the planner must evaluate each potentially applicable index. With thousands of partial indexes, its constraint exclusion logic is highly effective. It will quickly discard all indexes whose WHERE clause does not match the query's WHERE clause (e.g., tenant_id = '...'). In practice, the overhead is minimal for up to tens of thousands of indexes. Performance degradation typically becomes a concern only when approaching hundreds of thousands of indexes on a single table.

    * Monitoring: You need a way to track these indexes. A simple query can help:

    sql
        SELECT indexname, indexdef 
        FROM pg_indexes 
        WHERE tablename = 'orders' AND indexdef LIKE '%WHERE tenant_id =%';

    2. Tenant Offboarding: Index Cleanup

    When a tenant is deleted or deactivated, their partial indexes become dead weight, consuming disk space and adding to planner overhead. The offboarding process must include a step to drop them.

    This should also be handled asynchronously. The offboarding request can trigger an event for a cleanup worker.

    python
    # In your cleanup worker
    def drop_partial_indexes_for_tenant(tenant_id):
        logging.info(f"Processing request to drop indexes for tenant: {tenant_id}")
        tenant_id_safe = tenant_id.replace('-', '')
    
        try:
            with psycopg2.connect(DB_CONN_STRING) as conn:
                conn.autocommit = True
                with conn.cursor() as cur:
                    for index_def in INDEX_DEFINITIONS:
                        index_name = index_def["name_template"].format(tenant_id_safe=tenant_id_safe)
                        sql = f"DROP INDEX CONCURRENTLY IF EXISTS {index_name};"
                        logging.info(f"Executing: {sql}")
                        cur.execute(sql)
        except psycopg2.Error as e:
            logging.error(f"Database error while dropping indexes for tenant {tenant_id}: {e}")

    DROP INDEX CONCURRENTLY is again crucial to avoid locking the table during cleanup.

    3. The Alternative: Declarative Partitioning

    For systems with extremely high tenant counts (hundreds of thousands to millions) or very high write throughput per tenant, the partial index pattern can reach its limits. The ultimate solution for this scale is declarative partitioning, typically using LIST partitioning on tenant_id.

    sql
    -- Conceptual example
    CREATE TABLE orders_partitioned (
      -- columns...
      tenant_id UUID NOT NULL
    ) PARTITION BY LIST (tenant_id);
    
    CREATE TABLE orders_tenant_a1b2c3d4 PARTITION OF orders_partitioned
    FOR VALUES IN ('a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d');
    
    -- Each partition gets its own local indexes
    CREATE INDEX idx_orders_tenant_a1b2c3d4_date ON orders_tenant_a1b2c3d4 (order_date DESC);

    * Pros: This is the most scalable solution. Each partition is effectively a separate table with its own local indexes. The planner can completely ignore all other partitions (partition pruning), which is even more efficient than ignoring partial indexes. It also physically isolates tenant data, which can simplify vacuuming and maintenance.

    * Cons: It introduces significant operational complexity. You need a robust process for creating new partitions when tenants are onboarded. Foreign keys become more complex to manage. Queries that need to span across all tenants (e.g., for administrative analytics) become more difficult.

    Conclusion on Partitioning: Partial indexes are a powerful, less complex stepping stone. They provide most of the performance benefits of partitioning for read-heavy workloads without the full operational overhead. For many SaaS applications, the partial index pattern is the optimal balance of performance and complexity.

    4. Backfilling Indexes for Existing Tenants

    If you're implementing this pattern on an existing system, you'll need a one-time script to create indexes for all existing tenants.

    python
    # A one-off backfill script
    def backfill_all_tenants():
        with psycopg2.connect(DB_CONN_STRING) as conn:
            with conn.cursor() as cur:
                cur.execute("SELECT id FROM tenants;")
                tenant_ids = [row[0] for row in cur.fetchall()]
        
        # Use a thread pool to parallelize the work
        from concurrent.futures import ThreadPoolExecutor
        with ThreadPoolExecutor(max_workers=8) as executor:
            executor.map(create_partial_indexes_for_tenant, tenant_ids)
    

    Running this with a thread pool can significantly speed up the backfill process, but be mindful of the load it places on your database server.


    Final Thoughts: A Strategic Trade-off

    Adopting a per-tenant partial index strategy is a conscious architectural decision. It trades increased management complexity (automating index creation/deletion) for a massive, scalable improvement in query performance for RLS-protected, multi-tenant systems.

    This is not a solution for every application. If your tenant count is low, or your tables are small, a standard composite index on (tenant_id, ...) is perfectly adequate. But if you are on a high-growth trajectory and your orders table is heading towards billions of rows, this pattern can be the difference between a responsive application and one that grinds to a halt under its own weight.

    By leveraging asynchronous workers and CONCURRENTLY operations, you can implement this advanced strategy safely in a production environment, ensuring your application's data layer remains fast and efficient as you scale.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles