Performant RLS Policies in PostgreSQL using JWT Claims for Multi-Tenancy

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inherent Risk of Manual `WHERE` Clauses in Multi-Tenant Architectures

In any multi-tenant SaaS application, the cardinal rule is absolute data isolation. The most common implementation pattern is painfully familiar: appending WHERE tenant_id = :current_tenant_id to every single database query. While functionally correct, this approach is a ticking time bomb for security and a significant source of technical debt.

  • Human Error: A developer forgetting this clause in a single complex query—perhaps a reporting query with multiple joins—can lead to a catastrophic data leak across tenants. This risk scales with team size and codebase complexity.
  • Code Boilerplate: Every data access function becomes cluttered with tenant ID parameters and repetitive WHERE clauses, obscuring the core business logic.
  • Audit and Maintenance Nightmare: Verifying that every possible data access path correctly implements tenant isolation is a monumental, if not impossible, auditing task. Refactoring tenancy logic requires a global find-and-replace operation fraught with peril.
  • This is where PostgreSQL's Row-Level Security (RLS) enters. RLS provides a powerful, database-enforced mechanism for data isolation. However, implementing it in a performant, scalable way within a modern, stateless application architecture presents a unique set of advanced challenges. This article is not an introduction to RLS; it is a guide to solving the central problem: how to securely bridge the context gap between a stateless JWT-bearing application request and a stateful PostgreSQL session to drive RLS policies without killing performance.

    The Core Challenge: Stateless JWTs and Stateful Policies

    RLS policies are defined directly on tables and evaluated by PostgreSQL for every row access. A typical policy looks like this:

    sql
    -- A naive, non-functional example
    CREATE POLICY tenant_isolation ON documents
    USING (tenant_id = get_current_tenant_id());

    The get_current_tenant_id() function is the crux of the problem. How does this function, executing deep within the database, know which tenant is associated with the application request that triggered the query? The application has this information in a JWT, but the database connection from a connection pool is an anonymous, shared resource.

    The solution lies in setting session-local configuration parameters for the duration of a single transaction. This pattern allows us to "inject" the JWT claims into the database session just in time for the query to execute.

    Production Pattern: Propagating JWT Context via Middleware and `SET LOCAL`

    The most robust pattern involves using application middleware to decode the JWT and configure the database session before executing any business logic.

    Here’s a conceptual implementation using Node.js with Express and the pg library. The principles are identical for any language (Go, Python, Java).

    javascript
    // Middleware in a Node.js / Express application
    const { Pool } = require('pg');
    const jwt = require('jsonwebtoken');
    
    const pool = new Pool(/* connection config */);
    const JWT_SECRET = process.env.JWT_SECRET;
    
    // This middleware intercepts requests, verifies the JWT, and sets the DB context.
    async function setDatabaseContext(req, res, next) {
      const authHeader = req.headers.authorization;
      if (!authHeader || !authHeader.startsWith('Bearer ')) {
        // Or proceed as an anonymous user
        return next();
      }
    
      const token = authHeader.split(' ')[1];
      let claims;
    
      try {
        claims = jwt.verify(token, JWT_SECRET);
      } catch (err) {
        return res.status(401).send('Invalid token');
      }
    
      // Attach a dedicated client for this request to ensure transaction-level context
      const client = await pool.connect();
      req.dbClient = client;
    
      try {
        // CRITICAL: Use SET LOCAL. It scopes the setting to the current transaction only.
        // This prevents context from leaking to other requests using the same connection from the pool.
        // We serialize the entire claims object as a JSON string.
        const claimsJson = JSON.stringify(claims);
    
        // Use pg_escape_literal to prevent SQL injection if claims contained single quotes.
        await client.query(`SET LOCAL myapp.jwt.claims = ${client.escapeLiteral(claimsJson)}`);
    
        next(); // Proceed to the route handler
    
      } catch (err) {
        // Ensure client is released on error
        client.release();
        next(err);
      }
    }
    
    // Example route handler
    app.get('/api/documents', setDatabaseContext, async (req, res, next) => {
      try {
        // The req.dbClient is now context-aware for the duration of this transaction.
        const { rows } = await req.dbClient.query('SELECT id, title, content FROM documents');
        res.json(rows);
      } catch (err) {
        next(err);
      } finally {
        // IMPORTANT: Always release the client back to the pool in a finally block.
        if (req.dbClient) {
          req.dbClient.release();
        }
      }
    });

    Why `SET LOCAL` is Non-Negotiable

  • SET (or SET SESSION): The setting persists for the entire lifetime of the database session. In a connection-pooled environment, the next request that receives this connection from the pool will incorrectly inherit the previous user's identity. This is a critical security vulnerability.
  • SET LOCAL: The setting is effective only for the current transaction. When the transaction commits or rolls back, the setting is automatically reverted. This is the only safe choice for connection-pooled applications.
  • This middleware pattern ensures that every query executed within a request's lifecycle operates with the correct user and tenant context, which can now be accessed from within PostgreSQL.

    Crafting Performant `STABLE` RLS Policy Functions

    Now that we can pass the JWT claims into the session, we need to access them within our RLS policies. A naive approach would be to parse the JSON on every row check, which is disastrous for performance.

    Anti-Pattern: VOLATILE functions that re-parse JSON

    sql
    -- DO NOT DO THIS IN PRODUCTION
    CREATE OR REPLACE FUNCTION get_current_tenant_from_volatile_json() RETURNS UUID AS $$
    BEGIN
      -- This function is VOLATILE by default. It will be re-evaluated for every row.
      -- The JSON parsing and casting happens repeatedly, creating massive overhead.
      RETURN (current_setting('myapp.jwt.claims')::jsonb ->> 'tenant_id')::uuid;
    EXCEPTION
      WHEN OTHERS THEN
        RETURN NULL;
    END;
    $$ LANGUAGE plpgsql;
    
    -- The resulting policy is incredibly slow on large tables.
    CREATE POLICY slow_tenant_policy ON documents
    USING (tenant_id = get_current_tenant_from_volatile_json());

    PostgreSQL's query planner cannot optimize calls to VOLATILE functions. It assumes the function's output can change at any time, forcing re-evaluation for every single row the policy is applied to. For a query scanning thousands of rows, this means thousands of JSON parsing operations.

    Optimized Pattern: STABLE Helper Functions

    The correct approach is to create a set of STABLE helper functions. A STABLE function guarantees that it will return the same result for the same arguments within a single query. Since our myapp.jwt.claims setting is fixed for the duration of the transaction (and thus the query), the planner can cache the function's result and reuse it, drastically reducing overhead.

    sql
    -- Create a set of robust, performant, STABLE helper functions.
    
    -- Function to get the full claims JSONB object. Caches the result per-query.
    CREATE OR REPLACE FUNCTION current_claims() RETURNS jsonb AS $$
    BEGIN
      -- The 'true' flag tells current_setting to return NULL if the setting is not found,
      -- instead of throwing an error.
      RETURN current_setting('myapp.jwt.claims', true)::jsonb;
    EXCEPTION
      -- Handle cases where the setting is not valid JSON.
      WHEN invalid_text_representation THEN
        RETURN '{}'::jsonb;
    END;
    $$ LANGUAGE plpgsql STABLE;
    
    -- Function to get the tenant_id. Note the explicit casting and error handling.
    CREATE OR REPLACE FUNCTION current_tenant_id() RETURNS UUID AS $$
    DECLARE
      tenant_id_text TEXT;
    BEGIN
      tenant_id_text := current_claims() ->> 'tenant_id';
      IF tenant_id_text IS NULL THEN
        RETURN NULL;
      END IF;
      -- Defensive check for UUID format before casting
      IF tenant_id_text !~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$' THEN
          RETURN NULL;
      END IF;
      RETURN tenant_id_text::uuid;
    END;
    $$ LANGUAGE plpgsql STABLE;
    
    -- Function to get the user_id (subject from JWT).
    CREATE OR REPLACE FUNCTION current_user_id() RETURNS UUID AS $$
    DECLARE
      user_id_text TEXT;
    BEGIN
      user_id_text := current_claims() ->> 'sub';
      IF user_id_text IS NULL THEN
        RETURN NULL;
      END IF;
      IF user_id_text !~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$' THEN
          RETURN NULL;
      END IF;
      RETURN user_id_text::uuid;
    END;
    $$ LANGUAGE plpgsql STABLE;
    
    -- Function to get the user's role.
    CREATE OR REPLACE FUNCTION current_user_role() RETURNS TEXT AS $$
    BEGIN
      RETURN current_claims() ->> 'role';
    END;
    $$ LANGUAGE plpgsql STABLE;

    With these helper functions, our RLS policy becomes both clean and performant:

    sql
    -- First, enable RLS on the table.
    ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
    
    -- Create a clean, performant policy using our STABLE helper.
    CREATE POLICY tenant_isolation ON documents
    AS PERMISSIVE FOR ALL
    USING (tenant_id = current_tenant_id());

    The query planner now understands that current_tenant_id() will not change during the query's execution. It calls the function once, caches the result, and uses it for all subsequent row checks, effectively behaving as if a constant were used in the policy.

    Handling Complex Scenarios: Roles, Ownership, and Shared Data

    Real-world applications require more than simple tenant isolation. Our STABLE function pattern extends elegantly to handle these complexities.

    Scenario 1: Role-Based Access within a Tenant

    Imagine a system where admin users can see all documents in their tenant, but member users can only see documents they created.

    sql
    -- Assumes the 'documents' table has an 'owner_id' column.
    
    -- Drop the old policy before creating a new one
    DROP POLICY IF EXISTS tenant_and_role_isolation ON documents;
    
    CREATE POLICY tenant_and_role_isolation ON documents
    AS PERMISSIVE FOR ALL
    USING (
      tenant_id = current_tenant_id() AND (
        -- Admins can see everything within their tenant.
        current_user_role() = 'admin'
        OR
        -- Members can only see their own documents.
        (current_user_role() = 'member' AND owner_id = current_user_id())
      )
    );

    This single policy enforces both tenant isolation and intra-tenant role-based permissions transparently. The application code remains simple: SELECT * FROM documents;.

    Scenario 2: Cross-Tenant Data Sharing

    A common feature is sharing a resource (e.g., a document) with users from another tenant. This requires a junction table.

    sql
    CREATE TABLE document_shares (
      document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
      shared_with_tenant_id UUID NOT NULL,
      PRIMARY KEY (document_id, shared_with_tenant_id)
    );
    -- Critical index for policy performance!
    CREATE INDEX idx_document_shares_lookup ON document_shares(document_id, shared_with_tenant_id);

    We can now craft a more sophisticated SELECT policy that checks for ownership OR a share record.

    sql
    -- We might have different policies for SELECT vs. UPDATE/DELETE
    DROP POLICY IF EXISTS select_documents ON documents;
    DROP POLICY IF EXISTS modify_documents ON documents;
    
    -- Policy for viewing documents: either in your tenant or shared with your tenant.
    CREATE POLICY select_documents ON documents
    AS PERMISSIVE FOR SELECT
    USING (
      tenant_id = current_tenant_id()
      OR
      EXISTS (
        SELECT 1
        FROM document_shares ds
        WHERE ds.document_id = documents.id
          AND ds.shared_with_tenant_id = current_tenant_id()
      )
    );
    
    -- Policy for modifying documents: you must be the owner.
    CREATE POLICY modify_documents ON documents
    AS PERMISSIVE FOR UPDATE, DELETE
    USING (tenant_id = current_tenant_id());

    The use of EXISTS is generally performant, provided the junction table (document_shares) is properly indexed. The query planner can use the index to perform a very fast lookup.

    Edge Cases and Production Hardening

    Deploying RLS to production requires careful consideration of several edge cases.

  • Superuser and BYPASSRLS Roles: RLS policies do not apply to table owners, superusers, or any role created with the BYPASSRLS attribute. Your application's database user must be a non-privileged role that does not own the tables it interacts with and does not have BYPASSRLS.
  • SECURITY DEFINER Functions: Be extremely cautious with functions marked SECURITY DEFINER inside RLS policies. Such functions execute with the privileges of the user who defined the function, not the user invoking it. This is a powerful tool for granting specific, elevated permissions, but it can easily create security holes if not written with meticulous care to prevent SQL injection or unintended data access.
  • Connection Pooler Compatibility: This pattern relies on session-level state (SET LOCAL). It works perfectly with session-based connection poolers like PgBouncer in session pooling mode. However, it is incompatible with PgBouncer's transaction pooling mode. Transaction-level poolers can switch the underlying session between transactions, meaning your SET LOCAL command might not apply to the subsequent SELECT query. Always ensure your pooling strategy is session-aware.
  • Testing Strategies: Testing RLS is crucial. You can write integration tests that impersonate different users within a single transaction.
  • sql
        -- Example of a test script using psql or a test runner
        BEGIN;
    
        -- Impersonate user 1 from tenant A
        SET LOCAL myapp.jwt.claims = '{"sub": "user-id-1", "tenant_id": "tenant-id-a", "role": "member"}';
        -- Run assertions: This SELECT should only return docs for user 1 in tenant A.
        SELECT COUNT(*) FROM documents; 
    
        -- Impersonate user 2 (admin) from tenant A
        SET LOCAL myapp.jwt.claims = '{"sub": "user-id-2", "tenant_id": "tenant-id-a", "role": "admin"}';
        -- Run assertions: This SELECT should return ALL docs for tenant A.
        SELECT COUNT(*) FROM documents;
    
        -- Impersonate user 3 from tenant B
        SET LOCAL myapp.jwt.claims = '{"sub": "user-id-3", "tenant_id": "tenant-id-b", "role": "admin"}';
        -- Run assertions: This SELECT should return ALL docs for tenant B.
        SELECT COUNT(*) FROM documents;
    
        ROLLBACK; -- End the test transaction

    Benchmark: The Performance Impact of `STABLE` vs. `VOLATILE`

    To quantify the performance difference, we'll run a benchmark using pgbench against a table with 10 million rows distributed across 10,000 tenants.

    Schema:

    sql
    CREATE TABLE documents (
      id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
      tenant_id UUID NOT NULL,
      owner_id UUID NOT NULL,
      title TEXT NOT NULL,
      content TEXT,
      created_at TIMESTAMPTZ DEFAULT NOW()
    );
    CREATE INDEX idx_documents_tenant_id ON documents(tenant_id);
    -- Populate with 10M rows

    Benchmark Scenarios:

  • Baseline: Manual WHERE clause, RLS disabled.
  • RLS with VOLATILE function: RLS enabled, using the naive JSON parsing function.
  • RLS with STABLE function: RLS enabled, using our optimized helper functions.
  • Test Query: SELECT COUNT(*) FROM documents; (The policy implicitly adds the WHERE clause).

    Results (Illustrative):

    ScenarioAverage Latency (ms)Transactions per Second (TPS)Notes
    Baseline (Manual WHERE)2.5 ms400The fastest, but insecure and hard to maintain.
    RLS with VOLATILE function350.0 ms2.8Unusably slow. Function is called for every row scanned.
    RLS with STABLE function2.8 ms357Nearly identical to baseline. The overhead is negligible.

    EXPLAIN ANALYZE Analysis:

    * VOLATILE Plan:

    text
        Aggregate  (cost=150000.00..150000.01 rows=1 width=8)
          ->  Seq Scan on documents  (cost=0.00..145000.00 rows=10000000 width=0)
                Filter: (tenant_id = get_current_tenant_from_volatile_json()) -- HIGH COST

    The planner is forced into a full sequential scan, evaluating the expensive get_current_tenant_from_volatile_json() function for all 10 million rows.

    * STABLE Plan:

    text
        Aggregate  (cost=1500.00..1500.01 rows=1 width=8)
          ->  Index Only Scan using idx_documents_tenant_id on documents (cost=0.43..1450.00 rows=1000 width=0)
                Index Cond: (tenant_id = current_tenant_id()) -- LOW COST

    The planner calls current_tenant_id() once, caches the result, and then uses it as a constant in the Index Cond. This allows it to perform a highly efficient Index Scan, only looking at the rows relevant to the current tenant.

    Conclusion

    While manual WHERE tenant_id = ? clauses are a common starting point for multi-tenancy, they represent a significant, ongoing security risk and maintenance burden. PostgreSQL RLS, when implemented correctly, elevates data isolation from an application-level concern to a database-enforced guarantee.

    The key to unlocking this capability in a modern, stateless architecture is the JWT -> Middleware -> SET LOCAL -> STABLE Function -> RLS Policy pattern. This approach provides a secure bridge for user context, and the use of STABLE functions ensures that the performance overhead is negligible compared to manual filtering. By centralizing your authorization logic directly within the database, you create a more robust, auditable, and secure foundation for any multi-tenant application.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles