PostgreSQL RLS with JWTs for Scalable Multi-Tenant Isolation
The Fragility of Application-Layer Multi-Tenancy
In building any multi-tenant SaaS application, the cardinal sin is data leakage between tenants. The standard approach is deceptively simple: add a tenant_id column to every relevant table and meticulously ensure every single database query includes a WHERE tenant_id = ? clause. This works, until it doesn't. A single forgotten WHERE clause in a complex join, a bug in a data access layer abstraction, or a hastily written internal tool can expose one tenant's sensitive data to another.
This application-layer enforcement is fragile because it relies on perfect, consistent developer discipline across the entire codebase, forever. As teams grow and codebases evolve, the risk of a breach increases non-linearly. The database, the ultimate guardian of the data, is treated as a passive store, completely unaware of the multi-tenant boundaries it is meant to protect.
We can, and must, do better. By delegating the enforcement of these boundaries to the database itself using PostgreSQL's Row-Level Security (RLS), we create a robust, default-deny security posture. RLS ensures that no matter how flawed the application query is, the database will silently and automatically filter the results to only what the current session is authorized to see. This article details a production-grade pattern for integrating RLS with a stateless JWT-based authentication system, creating a powerful, centralized, and performant data isolation architecture.
Architecture Overview: JWT Claims Driving RLS Policies
The core principle is to make the database session aware of the authenticated user's context—specifically their tenant ID and role. We achieve this by passing claims from a verified JWT into the database session at the beginning of each transaction. RLS policies then use this session context to make authorization decisions.
Here's the data flow for an incoming API request:
Authorization: Bearer header.tenant_id, user_id, role). This function sets session-local configuration variables.SELECT FROM projects;. Critically, this query does not* need a WHERE tenant_id = ? clause.projects table. These policies read the session variables set in step 4 to filter the rows.This architecture centralizes tenancy logic within the database schema, making it auditable, consistent, and immune to most application-level bugs.
Section 1: Deep Dive - Implementation Details
Let's build this system from the ground up. We'll use a simple schema for a project management tool.
1.1. Schema and Initial Setup
We need tables for tenants, users, and projects. Note the use of uuid for primary keys, a common practice in distributed systems.
-- Enable necessary extensions
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
-- Schema Definition
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
email TEXT NOT NULL UNIQUE,
role TEXT NOT NULL DEFAULT 'member', -- e.g., 'member', 'admin'
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Crucial Index for RLS Performance
CREATE INDEX idx_projects_tenant_id ON projects(tenant_id);
CREATE INDEX idx_users_tenant_id ON users(tenant_id);
1.2. The Application User Role
We never want our application to connect as a superuser. We'll create a dedicated, low-privilege role for the application. This role will be granted only the specific permissions it needs (SELECT, INSERT, UPDATE, DELETE) on the tables.
-- Create a role for our application
CREATE ROLE app_user WITH LOGIN PASSWORD 'a_very_strong_password';
-- Grant basic connection permissions
GRANT CONNECT ON DATABASE your_database_name TO app_user;
GRANT USAGE ON SCHEMA public TO app_user;
-- Grant table-level permissions
GRANT SELECT, INSERT, UPDATE, DELETE ON tenants, users, projects TO app_user;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO app_user;
1.3. The Heart of the System: The Session Context Function
This function is the bridge between our stateless application and the stateful database session. It takes JWT claims as a JSONB object and uses set_config() to set transaction-local variables. Using current_setting() with a true second argument ensures that an error is thrown if the setting is not found, preventing silent failures.
-- This function sets session variables for the current transaction.
-- The 'is_local' parameter in set_config is TRUE, so the settings only last for the current transaction.
-- This is critical for compatibility with connection poolers.
CREATE OR REPLACE FUNCTION set_auth_context(jwt_claims JSONB)
RETURNS VOID AS $$
DECLARE
tenant_id_val UUID;
user_id_val UUID;
role_val TEXT;
BEGIN
-- Extract claims, ensuring they exist and are of the correct type.
-- The '->>' operator extracts a JSON field as text.
tenant_id_val := (jwt_claims ->> 'tenant_id')::UUID;
user_id_val := (jwt_claims ->> 'user_id')::UUID;
role_val := (jwt_claims ->> 'role')::TEXT;
IF tenant_id_val IS NULL OR user_id_val IS NULL OR role_val IS NULL THEN
RAISE EXCEPTION 'Invalid JWT claims: one or more required fields are missing';
END IF;
-- Set the configuration variables for the current transaction.
-- These are namespaced to avoid conflicts.
PERFORM set_config('app.current.tenant_id', tenant_id_val::TEXT, true);
PERFORM set_config('app.current.user_id', user_id_val::TEXT, true);
PERFORM set_config('app.current.role', role_val, true);
EXCEPTION
WHEN OTHERS THEN
-- Log the error and re-raise to ensure the transaction fails.
RAISE WARNING 'Error setting auth context: %', SQLERRM;
RAISE;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
-- Grant execute permission to our application user
GRANT EXECUTE ON FUNCTION set_auth_context(JSONB) TO app_user;
Key Design Choices:
* SECURITY DEFINER: This allows the function to run with the privileges of the user who defined it (usually a superuser), not the calling user (app_user). While not strictly necessary here, it's a useful pattern for functions that might need to access restricted tables for validation.
* set_config(..., true): The third argument, is_local, is true. This makes the setting transaction-scoped. When the transaction commits or rolls back, the setting is gone. This is ABSOLUTELY ESSENTIAL for working with transaction-level connection poolers like PgBouncer.
* Error Handling: The function validates the presence of claims, preventing NULL values from poisoning the session context.
1.4. Defining the RLS Policies
Now we enable RLS and define the rules. The policies are expressions that return a boolean. If the expression is true for a given row, the row is visible/accessible.
-- Enable RLS on the tables we want to protect.
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
-- Create a helper function to get the current tenant_id. This is cleaner than repeating the cast.
-- Mark it as STABLE as its result is stable within a transaction.
CREATE OR REPLACE FUNCTION current_tenant_id()
RETURNS UUID AS $$
BEGIN
RETURN current_setting('app.current.tenant_id')::UUID;
EXCEPTION
WHEN OTHERS THEN
-- Return a default or NULL if the setting is not found.
-- This prevents queries from failing if run outside an authenticated session.
-- A policy can then decide how to handle this (e.g., deny access).
RETURN '00000000-0000-0000-0000-000000000000';
END;
$$ LANGUAGE plpgsql STABLE;
-- **POLICY 1: Simple Tenant Isolation for 'projects' table **
CREATE POLICY tenant_isolation_policy ON projects
AS PERMISSIVE -- PERMISSIVE means policies are combined with OR
FOR ALL -- Applies to SELECT, INSERT, UPDATE, DELETE
TO app_user
USING (tenant_id = current_tenant_id())
WITH CHECK (tenant_id = current_tenant_id());
-- **POLICY 2: More complex policy for 'users' table **
-- Users should only see other users in their own tenant.
-- Additionally, only 'admin' users can see other users. 'member' users can only see themselves.
CREATE POLICY user_visibility_policy ON users
AS PERMISSIVE
FOR SELECT
TO app_user
USING (
tenant_id = current_tenant_id() AND
(
-- Admins can see all users in their tenant
current_setting('app.current.role') = 'admin' OR
-- Members can only see their own record
id = current_setting('app.current.user_id')::UUID
)
);
-- Policy for INSERT/UPDATE/DELETE on 'users' table
-- Allow admins to modify users in their tenant, but members can only modify themselves.
CREATE POLICY user_modification_policy ON users
AS PERMISSIVE
FOR INSERT, UPDATE, DELETE
TO app_user
USING (
tenant_id = current_tenant_id() AND
(
current_setting('app.current.role') = 'admin' OR
id = current_setting('app.current.user_id')::UUID
)
)
WITH CHECK (
tenant_id = current_tenant_id() AND
(
current_setting('app.current.role') = 'admin' OR
id = current_setting('app.current.user_id')::UUID
)
);
Key Concepts:
* USING Clause: This applies to existing rows for SELECT, UPDATE, and DELETE queries. It acts as an implicit WHERE clause.
* WITH CHECK Clause: This applies to new or updated rows for INSERT and UPDATE queries. It ensures a user cannot write data that they would subsequently be unable to read. For instance, it prevents a user from Tenant A from inserting a project with tenant_id of Tenant B.
* PERMISSIVE vs. RESTRICTIVE: PERMISSIVE policies are combined with a logical OR. If any permissive policy evaluates to true, the row is accessible. RESTRICTIVE policies (the default if not specified) are combined with AND. If any restrictive policy is false, access is denied, regardless of permissive policies. For most tenancy models, a set of PERMISSIVE policies is easiest to reason about.
Section 2: Application Integration (Node.js Example)
Here’s how the application code would look using the node-postgres (pg) library. The key is the withTransaction helper function that correctly sets the context.
const { Pool } = require('pg');
const pool = new Pool({
user: 'app_user',
host: 'localhost',
database: 'your_database_name',
password: 'a_very_strong_password',
port: 5432,
});
// A mock JWT verification function
function verifyAndDecodeJwt(token) {
// In a real app, use a library like 'jsonwebtoken' to verify the signature
// against your public key or secret.
if (!token) throw new Error('No token provided');
// For this example, we'll just decode a mock base64 payload.
// Real JWT: { "tenant_id": "...", "user_id": "...", "role": "admin" }
try {
const payload = JSON.parse(Buffer.from(token.split('.')[1], 'base64').toString());
if (!payload.tenant_id || !payload.user_id || !payload.role) {
throw new Error('Invalid JWT claims');
}
return payload;
} catch (e) {
throw new Error('Invalid JWT format');
}
}
/**
* Higher-order function to wrap database operations in a transaction
* with RLS context set from a JWT.
* @param {object} jwtClaims - The decoded JWT payload.
* @param {Function} callback - An async function that receives the client and performs db operations.
*/
async function withRLSTransaction(jwtClaims, callback) {
const client = await pool.connect();
try {
await client.query('BEGIN');
// Set the RLS context for this transaction
const claimsJson = JSON.stringify(jwtClaims);
await client.query('SELECT set_auth_context($1::jsonb)', [claimsJson]);
// Execute the business logic
const result = await callback(client);
await client.query('COMMIT');
return result;
} catch (e) {
await client.query('ROLLBACK');
console.error('Transaction failed:', e.message);
throw e;
} finally {
client.release();
}
}
// --- Example Usage ---
async function getProjectsForUser(req) {
try {
// 1. Get token from request header
const token = req.headers.authorization.split(' ')[1];
// 2. Verify and decode JWT
const claims = verifyAndDecodeJwt(token);
// 3. Use the transactional wrapper
const projects = await withRLSTransaction(claims, async (client) => {
// 4. The query is simple and tenancy-unaware.
// RLS handles the security automatically.
console.log(`Executing query for tenant: ${claims.tenant_id}`);
const res = await client.query('SELECT id, name FROM projects');
return res.rows;
});
return { status: 200, body: projects };
} catch (error) {
// Handle auth errors, db errors, etc.
return { status: 401, body: { error: error.message } };
}
}
// --- Mocking a request and running it ---
async function main() {
// Mock JWT for Tenant 1, User A (Admin)
const adminJwtTenant1 = `header.${Buffer.from(JSON.stringify({ tenant_id: 'a4e3c23e-3e5b-4ba5-8274-32b4d36a3d61', user_id: 'd9b7f5b3-3e1a-4e8c-8b8f-3e8a6c4e2f8c', role: 'admin' })).toString('base64')}.signature`;
// Mock JWT for Tenant 2, User B (Member)
const memberJwtTenant2 = `header.${Buffer.from(JSON.stringify({ tenant_id: 'b5f4d34f-4f6c-5cb6-9385-43c5e47b4e72', user_id: 'e1c8g6c4-4f2b-5f9d-9c9g-4f9b7d5f3g9d', role: 'member' })).toString('base64')}.signature`;
console.log('--- Fetching projects as Admin from Tenant 1 ---');
const req1 = { headers: { authorization: `Bearer ${adminJwtTenant1}` } };
const res1 = await getProjectsForUser(req1);
console.log('Result:', res1.body);
console.log('\n--- Fetching projects as Member from Tenant 2 ---');
const req2 = { headers: { authorization: `Bearer ${memberJwtTenant2}` } };
const res2 = await getProjectsForUser(req2);
console.log('Result:', res2.body);
}
// main(); // You would run this after populating some data.
This pattern ensures that every query executed within the withRLSTransaction callback is automatically and correctly scoped to the tenant, without the developer having to think about it.
Section 3: Production Patterns & Performance
This is where theory meets reality. A naive RLS implementation can introduce subtle bugs and performance bottlenecks in a high-load production environment.
3.1. The Connection Pooling Conundrum (PgBouncer)
The Problem: Connection poolers like PgBouncer are essential for managing high connection counts. In its most performant transaction pooling mode, PgBouncer assigns a physical database connection to a client only for the duration of a single transaction. The next transaction from the same client might get a different physical connection. If you were to use SET app.current.tenant_id = '...' (which is session-scoped), that setting would persist on the physical connection after it's returned to the pool, potentially leaking to the next client that receives that connection. This is a catastrophic context-bleed security vulnerability.
The Solution: As implemented in our set_auth_context function, we used set_config(key, value, true). The true for is_local makes the setting transaction-scoped. It is automatically discarded when the transaction ends (COMMIT or ROLLBACK). This makes our RLS pattern completely safe to use with PgBouncer in transaction pooling mode, which is the recommended setup for scalability.
3.2. RLS Performance Overhead and Query Planning
RLS is not free. Each policy adds a predicate to your query, which the query planner must account for. The overhead is typically minimal if policies are simple and well-indexed, but complex policies can be costly.
The LEAKPROOF Attribute:
Consider a policy that calls a custom function: USING (tenant_id = get_tenant_from_some_other_table(user_id)). If this function is not marked as LEAKPROOF, PostgreSQL will be extremely conservative. It assumes the function might leak data from rows that should be invisible, so it will evaluate the function before applying other filters or index scans. This can lead to the planner choosing a much less efficient plan, like a sequential scan instead of an index scan.
A function can be marked LEAKPROOF if it has no side effects and its return value depends only on its arguments. Our current_tenant_id() function is a prime candidate.
-- Re-creating our helper function with LEAKPROOF and other optimizations
CREATE OR REPLACE FUNCTION current_tenant_id()
RETURNS UUID AS $$
BEGIN
-- Using coalesce is slightly cleaner than a full exception block for this case
RETURN coalesce(current_setting('app.current.tenant_id', true), '00000000-0000-0000-0000-000000000000')::UUID;
END;
$$ LANGUAGE plpgsql STABLE LEAKPROOF;
Marking this function as LEAKPROOF gives the query planner more information, allowing it to reorder operations and potentially choose much better execution plans, especially in complex queries with joins.
Benchmarking with EXPLAIN ANALYZE:
Always validate the performance of your RLS-protected queries.
-- In a psql session, simulate an application request
BEGIN;
-- Set the context
SELECT set_auth_context('{"tenant_id": "a4e3c23e-3e5b-4ba5-8274-32b4d36a3d61", "user_id": "...", "role": "admin"}'::jsonb);
-- Analyze the query
EXPLAIN ANALYZE SELECT * FROM projects WHERE name LIKE 'Q1%';
COMMIT;
Observe the query plan. You should see the RLS policy applied as a Filter or Recheck Cond. Ensure that an index on (tenant_id) is being used effectively. If your policy is USING (tenant_id = current_tenant_id()), an index on (tenant_id, name) would be ideal for the query above.
Section 4: Advanced Scenarios and Edge Cases
4.1. Superuser / Internal Tool Access
How do your background workers, cron jobs, or internal admin tools operate when RLS is enabled? They often need to bypass these rules.
Solution 1: The BYPASSRLS Attribute (Use with Extreme Caution)
You can create a specific role for internal processes and grant it the BYPASSRLS attribute.
CREATE ROLE internal_worker WITH LOGIN PASSWORD '...';
ALTER ROLE internal_worker BYPASSRLS;
Any connection made by internal_worker will completely ignore all RLS policies. This is powerful but dangerous. Access to this role's credentials must be strictly controlled and audited.
Solution 2: A Controllable Bypass in Policies
A safer method is to build a bypass mechanism into your policies. We can add another session variable, app.rls.bypass, that the internal tool can set.
-- Modified policy
CREATE POLICY tenant_isolation_policy ON projects
FOR ALL TO app_user
USING (
current_setting('app.rls.bypass', true) = 'true' OR
tenant_id = current_tenant_id()
)
WITH CHECK (
current_setting('app.rls.bypass', true) = 'true' OR
tenant_id = current_tenant_id()
);
An internal tool can now set this flag within its transaction to see all data, while regular user sessions will not have it set, and the current_setting function will return NULL, failing the check.
4.2. Handling Hierarchical Structures
What if a tenant can have sub-tenants, and a parent-tenant admin needs to see data from all child tenants? RLS can handle this.
First, model the hierarchy in your tenants table:
ALTER TABLE tenants ADD COLUMN parent_id UUID REFERENCES tenants(id);
Next, create a function that, given a tenant ID, returns a set of all its descendant tenant IDs, including itself.
CREATE OR REPLACE FUNCTION get_tenant_descendants(root_tenant_id UUID)
RETURNS TABLE(id UUID) AS $$
BEGIN
RETURN QUERY
WITH RECURSIVE tenant_tree AS (
SELECT t.id FROM tenants t WHERE t.id = root_tenant_id
UNION ALL
SELECT t.id FROM tenants t
INNER JOIN tenant_tree tt ON t.parent_id = tt.id
)
SELECT tt.id FROM tenant_tree tt;
END;
$$ LANGUAGE plpgsql STABLE LEAKPROOF;
Finally, update your RLS policy to use this function.
DROP POLICY tenant_isolation_policy ON projects;
CREATE POLICY hierarchical_tenant_policy ON projects
FOR ALL TO app_user
USING (tenant_id IN (SELECT id FROM get_tenant_descendants(current_tenant_id())))
WITH CHECK (tenant_id = current_tenant_id()); -- Still only allow writing to your own tenant
Now, a user from a parent tenant will automatically see data from all their sub-tenants, with the logic cleanly encapsulated in the database.
Conclusion: Security by Default
Implementing multi-tenancy with PostgreSQL RLS and JWTs is a paradigm shift. It moves security from a repetitive, error-prone application-layer task to a centralized, robust, database-enforced guarantee. While the initial setup is more complex than adding WHERE clauses, the long-term benefits are immense:
* Reduced Risk: Eliminates entire classes of data-leak bugs.
* Simplified Application Code: Business logic becomes cleaner and tenancy-agnostic.
* Centralized Auditing: The security rules are co-located with the data in the schema, making them easier to review and audit.
* Flexibility: Advanced scenarios like hierarchical permissions and role-based access can be modeled with powerful, expressive policies.
By carefully considering performance implications, especially the LEAKPROOF attribute, and by correctly managing state in a connection-pooled environment, you can build a highly secure, scalable, and maintainable SaaS backend where data isolation is the default, not an afterthought.