Performant RLS Policies in PostgreSQL using JWT Claims for Multi-Tenancy
The Inherent Risk of Manual `WHERE` Clauses in Multi-Tenant Architectures
In any multi-tenant SaaS application, the cardinal rule is absolute data isolation. The most common implementation pattern is painfully familiar: appending WHERE tenant_id = :current_tenant_id to every single database query. While functionally correct, this approach is a ticking time bomb for security and a significant source of technical debt.
WHERE clauses, obscuring the core business logic.This is where PostgreSQL's Row-Level Security (RLS) enters. RLS provides a powerful, database-enforced mechanism for data isolation. However, implementing it in a performant, scalable way within a modern, stateless application architecture presents a unique set of advanced challenges. This article is not an introduction to RLS; it is a guide to solving the central problem: how to securely bridge the context gap between a stateless JWT-bearing application request and a stateful PostgreSQL session to drive RLS policies without killing performance.
The Core Challenge: Stateless JWTs and Stateful Policies
RLS policies are defined directly on tables and evaluated by PostgreSQL for every row access. A typical policy looks like this:
-- A naive, non-functional example
CREATE POLICY tenant_isolation ON documents
USING (tenant_id = get_current_tenant_id());
The get_current_tenant_id() function is the crux of the problem. How does this function, executing deep within the database, know which tenant is associated with the application request that triggered the query? The application has this information in a JWT, but the database connection from a connection pool is an anonymous, shared resource.
The solution lies in setting session-local configuration parameters for the duration of a single transaction. This pattern allows us to "inject" the JWT claims into the database session just in time for the query to execute.
Production Pattern: Propagating JWT Context via Middleware and `SET LOCAL`
The most robust pattern involves using application middleware to decode the JWT and configure the database session before executing any business logic.
Here’s a conceptual implementation using Node.js with Express and the pg library. The principles are identical for any language (Go, Python, Java).
// Middleware in a Node.js / Express application
const { Pool } = require('pg');
const jwt = require('jsonwebtoken');
const pool = new Pool(/* connection config */);
const JWT_SECRET = process.env.JWT_SECRET;
// This middleware intercepts requests, verifies the JWT, and sets the DB context.
async function setDatabaseContext(req, res, next) {
const authHeader = req.headers.authorization;
if (!authHeader || !authHeader.startsWith('Bearer ')) {
// Or proceed as an anonymous user
return next();
}
const token = authHeader.split(' ')[1];
let claims;
try {
claims = jwt.verify(token, JWT_SECRET);
} catch (err) {
return res.status(401).send('Invalid token');
}
// Attach a dedicated client for this request to ensure transaction-level context
const client = await pool.connect();
req.dbClient = client;
try {
// CRITICAL: Use SET LOCAL. It scopes the setting to the current transaction only.
// This prevents context from leaking to other requests using the same connection from the pool.
// We serialize the entire claims object as a JSON string.
const claimsJson = JSON.stringify(claims);
// Use pg_escape_literal to prevent SQL injection if claims contained single quotes.
await client.query(`SET LOCAL myapp.jwt.claims = ${client.escapeLiteral(claimsJson)}`);
next(); // Proceed to the route handler
} catch (err) {
// Ensure client is released on error
client.release();
next(err);
}
}
// Example route handler
app.get('/api/documents', setDatabaseContext, async (req, res, next) => {
try {
// The req.dbClient is now context-aware for the duration of this transaction.
const { rows } = await req.dbClient.query('SELECT id, title, content FROM documents');
res.json(rows);
} catch (err) {
next(err);
} finally {
// IMPORTANT: Always release the client back to the pool in a finally block.
if (req.dbClient) {
req.dbClient.release();
}
}
});
Why `SET LOCAL` is Non-Negotiable
SET (or SET SESSION): The setting persists for the entire lifetime of the database session. In a connection-pooled environment, the next request that receives this connection from the pool will incorrectly inherit the previous user's identity. This is a critical security vulnerability.SET LOCAL: The setting is effective only for the current transaction. When the transaction commits or rolls back, the setting is automatically reverted. This is the only safe choice for connection-pooled applications.This middleware pattern ensures that every query executed within a request's lifecycle operates with the correct user and tenant context, which can now be accessed from within PostgreSQL.
Crafting Performant `STABLE` RLS Policy Functions
Now that we can pass the JWT claims into the session, we need to access them within our RLS policies. A naive approach would be to parse the JSON on every row check, which is disastrous for performance.
Anti-Pattern: VOLATILE functions that re-parse JSON
-- DO NOT DO THIS IN PRODUCTION
CREATE OR REPLACE FUNCTION get_current_tenant_from_volatile_json() RETURNS UUID AS $$
BEGIN
-- This function is VOLATILE by default. It will be re-evaluated for every row.
-- The JSON parsing and casting happens repeatedly, creating massive overhead.
RETURN (current_setting('myapp.jwt.claims')::jsonb ->> 'tenant_id')::uuid;
EXCEPTION
WHEN OTHERS THEN
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- The resulting policy is incredibly slow on large tables.
CREATE POLICY slow_tenant_policy ON documents
USING (tenant_id = get_current_tenant_from_volatile_json());
PostgreSQL's query planner cannot optimize calls to VOLATILE functions. It assumes the function's output can change at any time, forcing re-evaluation for every single row the policy is applied to. For a query scanning thousands of rows, this means thousands of JSON parsing operations.
Optimized Pattern: STABLE Helper Functions
The correct approach is to create a set of STABLE helper functions. A STABLE function guarantees that it will return the same result for the same arguments within a single query. Since our myapp.jwt.claims setting is fixed for the duration of the transaction (and thus the query), the planner can cache the function's result and reuse it, drastically reducing overhead.
-- Create a set of robust, performant, STABLE helper functions.
-- Function to get the full claims JSONB object. Caches the result per-query.
CREATE OR REPLACE FUNCTION current_claims() RETURNS jsonb AS $$
BEGIN
-- The 'true' flag tells current_setting to return NULL if the setting is not found,
-- instead of throwing an error.
RETURN current_setting('myapp.jwt.claims', true)::jsonb;
EXCEPTION
-- Handle cases where the setting is not valid JSON.
WHEN invalid_text_representation THEN
RETURN '{}'::jsonb;
END;
$$ LANGUAGE plpgsql STABLE;
-- Function to get the tenant_id. Note the explicit casting and error handling.
CREATE OR REPLACE FUNCTION current_tenant_id() RETURNS UUID AS $$
DECLARE
tenant_id_text TEXT;
BEGIN
tenant_id_text := current_claims() ->> 'tenant_id';
IF tenant_id_text IS NULL THEN
RETURN NULL;
END IF;
-- Defensive check for UUID format before casting
IF tenant_id_text !~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$' THEN
RETURN NULL;
END IF;
RETURN tenant_id_text::uuid;
END;
$$ LANGUAGE plpgsql STABLE;
-- Function to get the user_id (subject from JWT).
CREATE OR REPLACE FUNCTION current_user_id() RETURNS UUID AS $$
DECLARE
user_id_text TEXT;
BEGIN
user_id_text := current_claims() ->> 'sub';
IF user_id_text IS NULL THEN
RETURN NULL;
END IF;
IF user_id_text !~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$' THEN
RETURN NULL;
END IF;
RETURN user_id_text::uuid;
END;
$$ LANGUAGE plpgsql STABLE;
-- Function to get the user's role.
CREATE OR REPLACE FUNCTION current_user_role() RETURNS TEXT AS $$
BEGIN
RETURN current_claims() ->> 'role';
END;
$$ LANGUAGE plpgsql STABLE;
With these helper functions, our RLS policy becomes both clean and performant:
-- First, enable RLS on the table.
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
-- Create a clean, performant policy using our STABLE helper.
CREATE POLICY tenant_isolation ON documents
AS PERMISSIVE FOR ALL
USING (tenant_id = current_tenant_id());
The query planner now understands that current_tenant_id() will not change during the query's execution. It calls the function once, caches the result, and uses it for all subsequent row checks, effectively behaving as if a constant were used in the policy.
Handling Complex Scenarios: Roles, Ownership, and Shared Data
Real-world applications require more than simple tenant isolation. Our STABLE function pattern extends elegantly to handle these complexities.
Scenario 1: Role-Based Access within a Tenant
Imagine a system where admin users can see all documents in their tenant, but member users can only see documents they created.
-- Assumes the 'documents' table has an 'owner_id' column.
-- Drop the old policy before creating a new one
DROP POLICY IF EXISTS tenant_and_role_isolation ON documents;
CREATE POLICY tenant_and_role_isolation ON documents
AS PERMISSIVE FOR ALL
USING (
tenant_id = current_tenant_id() AND (
-- Admins can see everything within their tenant.
current_user_role() = 'admin'
OR
-- Members can only see their own documents.
(current_user_role() = 'member' AND owner_id = current_user_id())
)
);
This single policy enforces both tenant isolation and intra-tenant role-based permissions transparently. The application code remains simple: SELECT * FROM documents;.
Scenario 2: Cross-Tenant Data Sharing
A common feature is sharing a resource (e.g., a document) with users from another tenant. This requires a junction table.
CREATE TABLE document_shares (
document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
shared_with_tenant_id UUID NOT NULL,
PRIMARY KEY (document_id, shared_with_tenant_id)
);
-- Critical index for policy performance!
CREATE INDEX idx_document_shares_lookup ON document_shares(document_id, shared_with_tenant_id);
We can now craft a more sophisticated SELECT policy that checks for ownership OR a share record.
-- We might have different policies for SELECT vs. UPDATE/DELETE
DROP POLICY IF EXISTS select_documents ON documents;
DROP POLICY IF EXISTS modify_documents ON documents;
-- Policy for viewing documents: either in your tenant or shared with your tenant.
CREATE POLICY select_documents ON documents
AS PERMISSIVE FOR SELECT
USING (
tenant_id = current_tenant_id()
OR
EXISTS (
SELECT 1
FROM document_shares ds
WHERE ds.document_id = documents.id
AND ds.shared_with_tenant_id = current_tenant_id()
)
);
-- Policy for modifying documents: you must be the owner.
CREATE POLICY modify_documents ON documents
AS PERMISSIVE FOR UPDATE, DELETE
USING (tenant_id = current_tenant_id());
The use of EXISTS is generally performant, provided the junction table (document_shares) is properly indexed. The query planner can use the index to perform a very fast lookup.
Edge Cases and Production Hardening
Deploying RLS to production requires careful consideration of several edge cases.
BYPASSRLS Roles: RLS policies do not apply to table owners, superusers, or any role created with the BYPASSRLS attribute. Your application's database user must be a non-privileged role that does not own the tables it interacts with and does not have BYPASSRLS.SECURITY DEFINER Functions: Be extremely cautious with functions marked SECURITY DEFINER inside RLS policies. Such functions execute with the privileges of the user who defined the function, not the user invoking it. This is a powerful tool for granting specific, elevated permissions, but it can easily create security holes if not written with meticulous care to prevent SQL injection or unintended data access.SET LOCAL). It works perfectly with session-based connection poolers like PgBouncer in session pooling mode. However, it is incompatible with PgBouncer's transaction pooling mode. Transaction-level poolers can switch the underlying session between transactions, meaning your SET LOCAL command might not apply to the subsequent SELECT query. Always ensure your pooling strategy is session-aware. -- Example of a test script using psql or a test runner
BEGIN;
-- Impersonate user 1 from tenant A
SET LOCAL myapp.jwt.claims = '{"sub": "user-id-1", "tenant_id": "tenant-id-a", "role": "member"}';
-- Run assertions: This SELECT should only return docs for user 1 in tenant A.
SELECT COUNT(*) FROM documents;
-- Impersonate user 2 (admin) from tenant A
SET LOCAL myapp.jwt.claims = '{"sub": "user-id-2", "tenant_id": "tenant-id-a", "role": "admin"}';
-- Run assertions: This SELECT should return ALL docs for tenant A.
SELECT COUNT(*) FROM documents;
-- Impersonate user 3 from tenant B
SET LOCAL myapp.jwt.claims = '{"sub": "user-id-3", "tenant_id": "tenant-id-b", "role": "admin"}';
-- Run assertions: This SELECT should return ALL docs for tenant B.
SELECT COUNT(*) FROM documents;
ROLLBACK; -- End the test transaction
Benchmark: The Performance Impact of `STABLE` vs. `VOLATILE`
To quantify the performance difference, we'll run a benchmark using pgbench against a table with 10 million rows distributed across 10,000 tenants.
Schema:
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
owner_id UUID NOT NULL,
title TEXT NOT NULL,
content TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_documents_tenant_id ON documents(tenant_id);
-- Populate with 10M rows
Benchmark Scenarios:
WHERE clause, RLS disabled.VOLATILE function: RLS enabled, using the naive JSON parsing function.STABLE function: RLS enabled, using our optimized helper functions.Test Query: SELECT COUNT(*) FROM documents; (The policy implicitly adds the WHERE clause).
Results (Illustrative):
| Scenario | Average Latency (ms) | Transactions per Second (TPS) | Notes |
|---|---|---|---|
Baseline (Manual WHERE) | 2.5 ms | 400 | The fastest, but insecure and hard to maintain. |
RLS with VOLATILE function | 350.0 ms | 2.8 | Unusably slow. Function is called for every row scanned. |
RLS with STABLE function | 2.8 ms | 357 | Nearly identical to baseline. The overhead is negligible. |
EXPLAIN ANALYZE Analysis:
* VOLATILE Plan:
Aggregate (cost=150000.00..150000.01 rows=1 width=8)
-> Seq Scan on documents (cost=0.00..145000.00 rows=10000000 width=0)
Filter: (tenant_id = get_current_tenant_from_volatile_json()) -- HIGH COST
The planner is forced into a full sequential scan, evaluating the expensive get_current_tenant_from_volatile_json() function for all 10 million rows.
* STABLE Plan:
Aggregate (cost=1500.00..1500.01 rows=1 width=8)
-> Index Only Scan using idx_documents_tenant_id on documents (cost=0.43..1450.00 rows=1000 width=0)
Index Cond: (tenant_id = current_tenant_id()) -- LOW COST
The planner calls current_tenant_id() once, caches the result, and then uses it as a constant in the Index Cond. This allows it to perform a highly efficient Index Scan, only looking at the rows relevant to the current tenant.
Conclusion
While manual WHERE tenant_id = ? clauses are a common starting point for multi-tenancy, they represent a significant, ongoing security risk and maintenance burden. PostgreSQL RLS, when implemented correctly, elevates data isolation from an application-level concern to a database-enforced guarantee.
The key to unlocking this capability in a modern, stateless architecture is the JWT -> Middleware -> SET LOCAL -> STABLE Function -> RLS Policy pattern. This approach provides a secure bridge for user context, and the use of STABLE functions ensures that the performance overhead is negligible compared to manual filtering. By centralizing your authorization logic directly within the database, you create a more robust, auditable, and secure foundation for any multi-tenant application.