Production-Grade PostgreSQL RLS with JWTs for Serverless APIs

18 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Fallacy of Application-Layer Tenant Isolation

In multi-tenant SaaS architectures, particularly within a serverless paradigm, the challenge of strict data isolation is paramount. The conventional approach involves embedding tenant-scoping logic within the application layer. Every database query built by your ORM or query builder must include a WHERE tenant_id = ? clause. While functional, this pattern is fraught with peril for senior engineers scaling a system:

  • High Risk of Human Error: A developer forgetting a single WHERE clause on a complex query can lead to a catastrophic data leak across tenants. Static analysis and code reviews can only mitigate, not eliminate, this risk.
  • Codebase Complexity: Authorization logic becomes scattered across services, repositories, and ORM hooks. It's difficult to audit and maintain, leading to security logic drift.
  • Performance Overhead: For complex joins or reports, the application may need to fetch more data than necessary before filtering, or construct convoluted queries that the database planner struggles to optimize.
  • Moving authorization to the database layer via PostgreSQL's Row-Level Security (RLS) isn't just a security enhancement; it's an architectural shift. By defining data access policies directly on the tables themselves, we create an impenetrable wall of isolation. The database, as the ultimate arbiter of data, guarantees that no query, whether ad-hoc or application-generated, can access data outside its permitted scope. This article details the advanced patterns required to implement this robustly in a serverless environment using JWTs as the source of truth for user identity and permissions.

    The Architectural Blueprint: JWT Claims to RLS Policies

    Our target architecture follows a precise, secure flow for every API request:

  • Authentication: The client authenticates with an identity provider (e.g., Auth0, Cognito, or a custom service) and receives a JSON Web Token (JWT).
  • JWT Claims: The JWT payload is enriched with custom claims essential for authorization, such as tenant_id, user_id, and an application-specific role.
  • API Request: The client makes a request to a serverless function (e.g., AWS Lambda, Google Cloud Function) with the JWT in the Authorization header.
  • Connection & Context Assertion: The serverless function verifies the JWT. Instead of connecting to PostgreSQL with a single, highly privileged application user, it performs a carefully orchestrated sequence:
  • a. Connects as a low-privilege api_authenticator role.

    b. Within a single transaction, it uses SET LOCAL to create session-level variables from the JWT claims.

    c. It then executes SET LOCAL ROLE to switch to a more privileged tenant_user role for the duration of the transaction.

  • RLS Enforcement: The function executes its business logic query (e.g., SELECT * FROM projects). PostgreSQL transparently appends the RLS policies to the query, using the session variables to filter the results. The application code remains blissfully unaware of multi-tenancy.
  • Transaction Teardown: The transaction completes, and the connection is returned to the pool. All SET LOCAL settings are automatically discarded, ensuring the next use of the connection is clean and unprivileged.
  • This pattern centralizes security logic, simplifies application code, and leverages the database's query planner for optimal performance.

    Part 1: The Foundation - Database Schema and Roles

    Before writing policies, we must establish a solid foundation of roles and a multi-tenant schema. Our roles are designed around the principle of least privilege.

    Sample Multi-Tenant Schema

    Let's assume a simple SaaS schema where tenants have users, and users create projects.

    sql
    -- Enable necessary extensions
    CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
    
    -- Tenants table
    CREATE TABLE tenants (
        id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
        name TEXT NOT NULL,
        created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
    );
    
    -- Users table, linked to a tenant
    CREATE TABLE users (
        id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
        tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
        email TEXT NOT NULL UNIQUE,
        created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
    );
    
    -- Projects table, linked to a tenant and a creator
    CREATE TABLE projects (
        id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
        tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
        creator_id UUID NOT NULL REFERENCES users(id),
        name TEXT NOT NULL,
        created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
    );
    
    -- Create indexes for foreign keys
    CREATE INDEX ON users(tenant_id);
    CREATE INDEX ON projects(tenant_id);
    CREATE INDEX ON projects(creator_id);

    The Role Hierarchy

    We need a few distinct roles. The serverless function will transition between them during a request.

  • api_authenticator: The role the serverless function uses for its initial connection. It has virtually no privileges beyond connecting to the database and assuming other roles. It cannot read or write any table data directly.
  • tenant_user: The role assumed after authentication. This role is granted standard SELECT, INSERT, UPDATE, DELETE permissions on the application tables. RLS policies will be applied to this role.
  • migration_runner: A super-user role used exclusively by your migration tool (e.g., Flyway, Postgrator) which needs to bypass RLS to manage the schema.
  • Here is the SQL to create and configure these roles:

    sql
    -- 1. Create a role for the serverless function to connect with.
    --    It can't do anything except connect and switch roles.
    CREATE ROLE api_authenticator WITH LOGIN PASSWORD 'your_secure_password';
    
    -- 2. Create the role that authenticated users will operate as.
    --    It has no login privileges itself; it must be assumed.
    CREATE ROLE tenant_user;
    
    -- 3. The authenticator is allowed to switch to the tenant_user role.
    GRANT tenant_user TO api_authenticator;
    
    -- 4. Grant basic CRUD permissions to the tenant_user role.
    --    Note: We grant permissions on the tables, but RLS will ultimately decide what rows are visible.
    GRANT SELECT, INSERT, UPDATE, DELETE ON tenants, users, projects TO tenant_user;
    GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO tenant_user;
    
    -- 5. Create the migration role.
    CREATE ROLE migration_runner WITH LOGIN PASSWORD 'another_secure_password';
    -- This is critical: migrations must not be subject to RLS.
    ALTER ROLE migration_runner BYPASSRLS;
    
    -- Grant all privileges to the migration runner to manage the schema.
    GRANT ALL ON SCHEMA public TO migration_runner;
    GRANT ALL ON ALL TABLES IN SCHEMA public TO migration_runner;
    GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO migration_runner;

    Part 2: Crafting Granular and Performant RLS Policies

    With the schema and roles in place, we can now define the access rules. RLS policies are the heart of this architecture. They are essentially WHERE clauses that PostgreSQL automatically appends to queries against a table.

    First, we must enable RLS on each table we wish to protect.

    sql
    ALTER TABLE tenants ENABLE ROW LEVEL SECURITY;
    ALTER TABLE users ENABLE ROW LEVEL SECURITY;
    ALTER TABLE projects ENABLE ROW LEVEL SECURITY;

    The `current_setting` Trick

    How do policies know about the current user's tenant_id? We use PostgreSQL's current_setting() function. Our application will set a custom session variable, like app.jwt.claims.tenant_id, from the JWT. The policies then read this variable.

    sql
    -- Helper function to get a claim, handling potential missing values and casting.
    -- The `is_nullable` parameter prevents an error if the setting is not found.
    CREATE OR REPLACE FUNCTION get_jwt_claim(claim_name TEXT) RETURNS TEXT AS $$
    BEGIN
        RETURN current_setting('app.jwt.claims.' || claim_name, true);
    END;
    $$ LANGUAGE plpgsql STABLE;
    
    -- A more specific helper for UUID claims to reduce boilerplate in policies.
    CREATE OR REPLACE FUNCTION get_jwt_claim_uuid(claim_name TEXT) RETURNS UUID AS $$
    BEGIN
        RETURN get_jwt_claim(claim_name)::UUID;
    EXCEPTION
        WHEN OTHERS THEN
            RETURN NULL;
    END;
    $$ LANGUAGE plpgsql STABLE;

    Defining the Policies

    Now we create the policies for the tenant_user role.

    Policy 1: Tenant Isolation on projects

    This is the most fundamental policy. It ensures users can only interact with projects within their own tenant.

    sql
    CREATE POLICY tenant_isolation_policy ON projects
    AS PERMISSIVE -- 'PERMISSIVE' means policies are combined with OR
    FOR ALL -- Applies to SELECT, INSERT, UPDATE, DELETE
    TO tenant_user
    USING (tenant_id = get_jwt_claim_uuid('tenant_id'))
    WITH CHECK (tenant_id = get_jwt_claim_uuid('tenant_id'));
  • USING (...): This clause is applied for read operations (SELECT). A row is visible only if this expression evaluates to true.
  • WITH CHECK (...): This clause is applied for write operations (INSERT, UPDATE). An operation is allowed only if the new or updated row satisfies this expression. This is critical to prevent a user from creating a project for another tenant or moving a project between tenants.
  • Policy 2: User-Specific Access on projects

    What if we want to restrict access further, so users can only see projects they created? We can add another policy. Because policies are PERMISSIVE by default, a row is visible if any policy's USING clause is true for it. This isn't what we want. We need to combine conditions with AND. We can achieve this by modifying the existing policy or creating a more complex one.

    Let's refine our requirements: an 'admin' can see all projects in the tenant, while a 'member' can only see their own.

    sql
    -- Drop the old policy first
    DROP POLICY IF EXISTS tenant_isolation_policy ON projects;
    
    -- Create a new, role-aware policy
    CREATE POLICY role_based_project_access ON projects
    AS PERMISSIVE
    FOR ALL
    TO tenant_user
    USING (
        tenant_id = get_jwt_claim_uuid('tenant_id') AND
        (
            get_jwt_claim('role') = 'admin' OR
            creator_id = get_jwt_claim_uuid('user_id')
        )
    )
    WITH CHECK (
        tenant_id = get_jwt_claim_uuid('tenant_id')
    );

    This single policy now enforces two rules:

  • The project must always belong to the user's tenant (the outer AND condition).
    • Within the tenant, access is granted if the user's role is 'admin' OR if they are the creator of the project.

    The WITH CHECK clause remains simple, only ensuring the tenant_id is correct on writes. We assume the creator_id is set correctly by the application and is immutable.

    Part 3: Serverless Function Implementation (Node.js/TypeScript)

    This is where we bridge the gap between the stateless serverless world and stateful PostgreSQL sessions. The key is to manage the database connection's context on a per-request basis.

    We'll use AWS Lambda with the node-postgres (pg) library. The same principles apply to any serverless platform.

    The Critical Connection Logic

    Standard connection pooling can be dangerous here. If one function invocation sets session variables and another reuses that same connection without cleaning up, it could inherit the previous user's session, leading to a massive security breach. The solution is to use transaction-scoped settings with SET LOCAL.

    Here's a complete TypeScript example of a Lambda handler that safely queries the database.

    typescript
    import { Pool, PoolClient } from 'pg';
    import * as jwt from 'jsonwebtoken'; // Using jsonwebtoken library
    
    // Initialize the pool outside the handler for reuse across invocations
    const pool = new Pool({
        user: 'api_authenticator', // Connect as the low-privilege role
        password: process.env.DB_PASSWORD,
        host: process.env.DB_HOST,
        database: process.env.DB_NAME,
        port: 5432,
        ssl: { rejectUnauthorized: false }, // Adjust for production
    });
    
    interface JwtClaims {
        tenant_id: string;
        user_id: string;
        role: 'admin' | 'member';
        // ... other claims
    }
    
    // A type-safe JWT verification function
    function verifyAndDecodeJwt(token: string): JwtClaims | null {
        try {
            // In production, use a more secure secret management strategy
            const decoded = jwt.verify(token, process.env.JWT_SECRET!) as any;
            return {
                tenant_id: decoded.tenant_id,
                user_id: decoded.user_id,
                role: decoded.role,
            };
        } catch (error) {
            console.error('JWT verification failed:', error);
            return null;
        }
    }
    
    // The main Lambda handler
    export const handler = async (event: any) => {
        const authHeader = event.headers.Authorization;
        if (!authHeader || !authHeader.startsWith('Bearer ')) {
            return { statusCode: 401, body: 'Unauthorized' };
        }
    
        const token = authHeader.split(' ')[1];
        const claims = verifyAndDecodeJwt(token);
    
        if (!claims) {
            return { statusCode: 403, body: 'Forbidden: Invalid token' };
        }
    
        let client: PoolClient | null = null;
        try {
            client = await pool.connect();
    
            // THIS IS THE CORE LOGIC. ALL OPERATIONS ARE IN A SINGLE TRANSACTION.
            await client.query('BEGIN');
    
            // 1. Set the role for the duration of the transaction.
            //    The string is sanitized by the pg library.
            await client.query(`SET LOCAL ROLE tenant_user`);
    
            // 2. Set the JWT claims as session variables.
            //    These are also LOCAL to the transaction.
            await client.query(`SET LOCAL app.jwt.claims.tenant_id = '${claims.tenant_id}'`);
            await client.query(`SET LOCAL app.jwt.claims.user_id = '${claims.user_id}'`);
            await client.query(`SET LOCAL app.jwt.claims.role = '${claims.role}'`);
    
            // 3. Now, execute the business logic query.
            //    RLS is enforced automatically by PostgreSQL.
            const { rows } = await client.query('SELECT id, name FROM projects');
    
            await client.query('COMMIT');
    
            return {
                statusCode: 200,
                body: JSON.stringify(rows),
            };
        } catch (error) {
            console.error('Database transaction failed:', error);
            if (client) {
                // If anything fails, roll back the transaction
                await client.query('ROLLBACK');
            }
            return { statusCode: 500, body: 'Internal Server Error' };
        } finally {
            if (client) {
                // Release the client back to the pool.
                // The transaction is over, so all LOCAL settings are gone.
                client.release();
            }
        }
    };

    This pattern is robust because SET LOCAL guarantees that the settings are automatically discarded when the transaction ends (COMMIT or ROLLBACK). This makes it safe to use with a connection pooler like PgBouncer (in transaction mode) or the built-in pool of node-postgres.

    Part 4: Performance Considerations and Benchmarking

    While RLS is incredibly powerful, it's not a zero-cost abstraction. Senior engineers must understand the performance implications.

    The Overhead of `current_setting()`

    The current_setting() function is generally fast, but it's not free. When used in a policy, it will be executed for each row being considered. For queries scanning millions of rows, this can add up.

    Let's analyze a query plan. Assume we run SELECT * FROM projects as a user with the 'member' role.

    sql
    EXPLAIN ANALYZE SELECT * FROM projects;

    Without RLS:

    text
                                                     QUERY PLAN
    ------------------------------------------------------------------------------------------------------------
     Seq Scan on projects  (cost=0.00..155.00 rows=5000 width=60) (actual time=0.010..0.500 rows=5000 loops=1)
     Planning Time: 0.050 ms
     Execution Time: 0.800 ms

    With RLS:

    The query plan will now show a Filter condition that includes our policy.

    text
                                                              QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------
     Seq Scan on projects  (cost=0.00..180.00 rows=5 width=60) (actual time=0.050..1.200 rows=10 loops=1)
       Filter: ((tenant_id = get_jwt_claim_uuid('tenant_id'::text)) AND ((get_jwt_claim('role'::text) = 'admin'::text) OR (creator_id = get_jwt_claim_uuid('user_id'::text))))
       Rows Removed by Filter: 4990
     Planning Time: 0.150 ms
     Execution Time: 1.500 ms

    Notice a few things:

    * Cost Increase: The estimated cost has increased slightly.

    * Execution Time: The execution time is higher due to the function calls and filtering for every row.

    * Planner Estimation: The planner's row estimate (rows=5) is a wild guess because it cannot know the values of the session variables. This can sometimes lead to suboptimal plans (e.g., choosing a sequential scan over an index scan).

    Optimization: Ensure you have indexes on the columns used in your RLS policies (tenant_id, creator_id). If the planner chooses a sequential scan, it's because it believes the cost of the index scan plus the RLS filter is higher. For highly selective filters (where a tenant has few projects in a large table), an index on (tenant_id) is crucial and will be used effectively.

    Connection Pooling in Serverless: The Deep Dive

    As mentioned, connection pooling is the most significant operational challenge. The SET LOCAL in a transaction is the correct pattern, but let's explore why others fail:

    * No Pooling: Creating a new TCP connection and performing an SSL handshake for every Lambda invocation adds 50-200ms of latency. This is unacceptable for user-facing APIs.

    * SET vs SET LOCAL: Using SET (without LOCAL) makes the setting persist for the entire session. If you return a connection to the pool with these settings active, the next user of that connection will impersonate the previous one. Never use SET for request-specific data in a pooled environment.

    * Manual Reset: You could try to manually RESET the variables at the end of your function. This is brittle. If your function crashes after the COMMIT but before the RESET, the connection is returned to the pool in a poisoned state.

    The transactional SET LOCAL approach is the only one that provides an atomic guarantee of setup and teardown, making it resilient to errors and safe for pooling.

    Part 5: Advanced Scenarios and Production Edge Cases

    Real-world systems have more complex requirements than simple CRUD operations.

    The Super-Admin Problem

    How does an internal support tool or a super-admin view data across all tenants? They need a way to bypass RLS.

    Solution: Create a dedicated super_admin role and grant it the BYPASSRLS attribute.

    sql
    CREATE ROLE super_admin WITH LOGIN PASSWORD 'super_secret_password';
    ALTER ROLE super_admin BYPASSRLS;

    Your internal tools would connect as this super_admin user. Any query they run will completely ignore all RLS policies. This privilege should be guarded with extreme care, with access limited to specific IP addresses and protected by multi-factor authentication.

    Defensive Policies for Missing JWT Claims

    What if a misconfigured JWT is issued without a tenant_id claim? Our get_jwt_claim_uuid function would return NULL. The policy tenant_id = NULL would evaluate to NULL (not true), correctly denying access. However, it's better to be explicit.

    sql
    -- A stricter helper function that throws an error if the claim is missing
    CREATE OR REPLACE FUNCTION require_jwt_claim_uuid(claim_name TEXT) RETURNS UUID AS $$
    DECLARE
        claim_val TEXT := current_setting('app.jwt.claims.' || claim_name, true);
    BEGIN
        IF claim_val IS NULL THEN
            RAISE EXCEPTION 'Missing required JWT claim: %', claim_name;
        END IF;
        RETURN claim_val::UUID;
    END;
    $$ LANGUAGE plpgsql STABLE;

    Using this function in policies ensures that any request with an incomplete token fails immediately and loudly at the database level, rather than silently returning no rows.

    RLS and `SECURITY DEFINER` Functions

    Sometimes you need a function to run with the privileges of the user who defined it, not the user who called it. This is done with SECURITY DEFINER.

    Warning: By default, RLS policies are not applied inside SECURITY DEFINER functions if the function owner is the table owner. This can be a security hole.

    Imagine a function to calculate aggregate stats, owned by the migration_runner (who bypasses RLS):

    sql
    CREATE FUNCTION get_total_project_count() RETURNS BIGINT AS $$
        SELECT count(*) FROM projects;
    $$ LANGUAGE SQL SECURITY DEFINER;

    If a tenant_user calls this function, it will return the total count of projects across all tenants, because the RLS policy is bypassed within the function's context. To fix this, you must explicitly attach the RLS context.

    Solution: Use the force_row_level_security option when creating the function (PostgreSQL 9.5+).

    sql
    -- Correct, secure version
    ALTER FUNCTION get_total_project_count() SET force_row_level_security = on;

    Now, when the function is called, PostgreSQL will re-evaluate and apply the RLS policies for the calling user before executing the function's logic, closing the security loophole.

    Conclusion: A New Baseline for Multi-Tenant Security

    Implementing Row-Level Security with JWTs in a serverless architecture represents a significant step up in maturity for multi-tenant applications. It moves security from a fallible, distributed application-layer concern to a centralized, non-negotiable database-layer guarantee. The application code is simplified, becoming almost entirely unaware of tenancy, allowing developers to focus on business logic.

    While the initial setup of roles, policies, and the transactional connection logic is more complex than a simple ORM filter, the long-term benefits are profound. You gain a system that is more secure by default, easier to audit, and often more performant. For any senior engineer building a scalable, secure SaaS platform, this pattern should not be an afterthought; it should be the default choice for data architecture.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles