Advanced PostgreSQL RLS: Hierarchical Permissions in Multi-Tenant SaaS
The Achilles' Heel of SaaS: Complex Authorization Logic
In any non-trivial multi-tenant SaaS application, authorization is not a simple binary check. It's a complex, hierarchical graph problem. A user's access to a specific resource, like a task, often depends on their role within a project, which belongs to a team, which is part of an organization. Implementing this logic at the application layer is a well-trodden path fraught with peril. It leads to bloated ORM queries, forgotten WHERE clauses that create security vulnerabilities, and N+1 query performance nightmares.
PostgreSQL's Row-Level Security (RLS) offers a powerful alternative: enforcing data access policies directly within the database. However, most RLS tutorials stop at simple policies like WHERE tenant_id = current_tenant_id(). This is insufficient for real-world hierarchical models.
This post dissects two advanced, production-ready patterns for implementing complex hierarchical permissions with RLS, targeting senior engineers who understand the fundamentals. We will compare a dynamic, recursive approach with a pre-computed, high-performance strategy, complete with schema designs, full SQL functions, performance benchmarks, and strategies for handling the inevitable edge cases.
Setting the Stage: A Canonical SaaS Schema
Let's define a realistic schema that represents a project management tool. A tenant has users. Users are granted roles within organizations. Organizations have projects, and projects have tasks.
-- Enable necessary extensions
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "citext";
-- Tenants represent isolated customer accounts
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name TEXT NOT NULL
);
-- Users can belong to multiple tenants
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
email CITEXT NOT NULL UNIQUE,
-- other user details...
);
-- The top-level hierarchical entity, owned by a tenant
CREATE TABLE organizations (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name TEXT NOT NULL
);
-- Projects belong to organizations
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
name TEXT NOT NULL
);
-- Tasks are the leaf nodes of our hierarchy
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
title TEXT NOT NULL,
completed BOOLEAN DEFAULT false
);
-- Membership table to link users to organizations with a role
CREATE TYPE membership_role AS ENUM ('admin', 'member');
CREATE TABLE memberships (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
role membership_role NOT NULL,
UNIQUE(user_id, organization_id) -- A user can only have one role per org
);
-- Enable RLS on all relevant tables
ALTER TABLE organizations ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
ALTER TABLE memberships ENABLE ROW LEVEL SECURITY;
The core challenge: A policy on the tasks table must efficiently determine if the current user has membership in the task's parent project's parent organization. A simple join is not enough.
The Critical Prerequisite: Securely Setting Request Context
RLS policies rely on knowing who is making the request. The most robust way to pass this information from your application to PostgreSQL is via runtime configuration parameters set at the beginning of each transaction.
-- Helper function to get the current user ID. It's stable and can be inlined.
CREATE OR REPLACE FUNCTION auth.current_user_id() RETURNS UUID AS $$
SELECT nullif(current_setting('app.current_user_id', true), '')::UUID;
$$ LANGUAGE sql STABLE;
-- Helper to get the current tenant ID
CREATE OR REPLACE FUNCTION auth.current_tenant_id() RETURNS UUID AS $$
SELECT nullif(current_setting('app.current_tenant_id', true), '')::UUID;
$$ LANGUAGE sql STABLE;
In your application's request middleware (e.g., Express.js, Django), for every authenticated request, your database connection pool must execute the following before any other query:
// Example in a Node.js middleware using node-postgres (pg)
async function setUserContext(req, res, next) {
const client = await pool.connect();
try {
// Use a transaction to ensure settings are scoped
await client.query('BEGIN');
// Set the user and tenant IDs for the duration of this transaction
// These values are isolated to this specific session.
await client.query(`SET LOCAL app.current_user_id = '${req.user.id}'`);
await client.query(`SET LOCAL app.current_tenant_id = '${req.user.tenant_id}'`);
req.dbClient = client; // Attach the client to the request object
next();
} catch (e) {
await client.query('ROLLBACK');
client.release();
next(e);
}
}
// In your route handler, use req.dbClient and remember to COMMIT and release.
Using SET LOCAL is crucial as it scopes the setting to the current transaction, preventing state leakage between concurrent requests using the same database connection.
Pattern 1: Dynamic Hierarchy Traversal with Recursive CTEs
This pattern uses a SQL function with a recursive query to traverse the hierarchy at read-time. It's conceptually straightforward but has significant performance implications.
First, we define a SECURITY DEFINER function. This is essential because the function needs to see all rows in memberships and projects to determine access, not just the rows the calling user can already see. A SECURITY INVOKER function (the default) would be constrained by any existing RLS policies, leading to a chicken-and-egg problem.
-- A SECURITY DEFINER function to check access to a specific task
CREATE OR REPLACE FUNCTION auth.can_access_task(task_id UUID) RETURNS BOOLEAN AS $$
DECLARE
has_access BOOLEAN;
BEGIN
SELECT EXISTS (
WITH RECURSIVE hierarchy AS (
-- Base case: start from the task and find its project
SELECT p.id as project_id, p.organization_id
FROM tasks t
JOIN projects p ON t.project_id = p.id
WHERE t.id = task_id
-- This recursive part is not strictly needed for this schema
-- but demonstrates how you'd traverse up multiple levels (e.g. project groups)
-- UNION ALL
-- SELECT ... FROM hierarchy ...
)
-- Check if the current user has a membership in the organization found
SELECT 1
FROM hierarchy h
JOIN memberships m ON h.organization_id = m.organization_id
WHERE m.user_id = auth.current_user_id()
) INTO has_access;
RETURN has_access;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER STABLE;
Now, we can create the RLS policies using this function.
-- Policy for tasks
CREATE POLICY select_tasks ON tasks
FOR SELECT
USING (auth.can_access_task(id));
-- Policy for projects (requires a similar function, auth.can_access_project)
CREATE POLICY select_projects ON projects
FOR SELECT
USING (EXISTS (
SELECT 1 FROM memberships m
WHERE m.organization_id = projects.organization_id
AND m.user_id = auth.current_user_id()
));
-- Policy for organizations
CREATE POLICY select_organizations ON organizations
FOR SELECT
USING (EXISTS (
SELECT 1 FROM memberships m
WHERE m.organization_id = organizations.id
AND m.user_id = auth.current_user_id()
));
Performance Analysis and Pitfalls
This approach is elegant but hides a performance time bomb. The RLS policy function, auth.can_access_task, is executed for every single row that the query optimizer considers. Let's see this in action.
Imagine we have 1,000,000 tasks and a user queries for incomplete tasks: SELECT * FROM tasks WHERE completed = false;
EXPLAIN ANALYZE SELECT * FROM tasks WHERE completed = false;
Expected EXPLAIN Output Snippet:
Seq Scan on tasks (cost=0.00..150000.00 rows=500000 width=...) (actual time=0.100..5000.000 rows=... loops=1)
Filter: (NOT completed AND auth.can_access_task(id))
Rows Removed by Filter: ...
Planning Time: ...
Execution Time: 5010.123 ms
The key issue is the Filter: ... AND auth.can_access_task(id). The database will likely perform a sequential scan and run our complex, multi-join recursive function for hundreds of thousands of rows. The cost is astronomical. Even with an index on completed, the function call within the filter prevents the index from being fully effective for narrowing down the rows before the check.
When to use this pattern:
* For hierarchies that change very frequently, making pre-computation difficult.
* On tables with a very low number of rows.
* In write-heavy systems where the performance cost of maintaining a separate permissions table is too high.
For most read-heavy SaaS applications, this is not a scalable solution.
Pattern 2: Pre-computation with a Denormalized Permissions Cache
This pattern trades write-time complexity for massive read-time performance gains. The core idea is to create a denormalized table that explicitly stores which users can access which resources.
Let's create this user_permissions table.
CREATE TABLE auth.user_permissions (
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
entity_id UUID NOT NULL, -- The ID of the organization, project, or task
entity_type TEXT NOT NULL, -- 'organization', 'project', 'task'
role membership_role, -- Store the role for more granular checks
PRIMARY KEY (user_id, entity_id)
);
-- Critical index for fast RLS lookups
CREATE INDEX idx_user_permissions_lookup ON auth.user_permissions(entity_id, user_id);
Now, how do we populate and maintain this table? PostgreSQL triggers are the perfect tool.
Maintaining the Cache with Triggers
We need a trigger on the memberships table. When a membership is created, we must grant permissions to that organization and all its descendant projects and tasks. When it's deleted, we must revoke them.
-- Function to grant permissions down the hierarchy
CREATE OR REPLACE FUNCTION auth.grant_permissions_on_membership_insert() RETURNS TRIGGER AS $$
BEGIN
-- Grant access to the organization itself
INSERT INTO auth.user_permissions (user_id, entity_id, entity_type, role)
VALUES (NEW.user_id, NEW.organization_id, 'organization', NEW.role)
ON CONFLICT (user_id, entity_id) DO UPDATE SET role = EXCLUDED.role;
-- Grant access to all projects within that organization
INSERT INTO auth.user_permissions (user_id, entity_id, entity_type, role)
SELECT NEW.user_id, p.id, 'project', NEW.role
FROM projects p
WHERE p.organization_id = NEW.organization_id
ON CONFLICT (user_id, entity_id) DO UPDATE SET role = EXCLUDED.role;
-- Grant access to all tasks within those projects
INSERT INTO auth.user_permissions (user_id, entity_id, entity_type, role)
SELECT NEW.user_id, t.id, 'task', NEW.role
FROM tasks t
JOIN projects p ON t.project_id = p.id
WHERE p.organization_id = NEW.organization_id
ON CONFLICT (user_id, entity_id) DO UPDATE SET role = EXCLUDED.role;
RETURN NEW;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
CREATE TRIGGER on_membership_insert
AFTER INSERT OR UPDATE ON memberships
FOR EACH ROW EXECUTE FUNCTION auth.grant_permissions_on_membership_insert();
And for deletion:
-- Function to revoke permissions on membership deletion
CREATE OR REPLACE FUNCTION auth.revoke_permissions_on_membership_delete() RETURNS TRIGGER AS $$
BEGIN
-- This is a simplified revocation. A more robust solution would check
-- if the user has other memberships that grant access to the same items.
-- For this example, we assume one membership per org is the only source of truth.
DELETE FROM auth.user_permissions up
USING projects p, tasks t
WHERE up.user_id = OLD.user_id
AND (
(up.entity_id = OLD.organization_id) OR
(up.entity_id = p.id AND p.organization_id = OLD.organization_id) OR
(up.entity_id = t.id AND t.project_id = p.id AND p.organization_id = OLD.organization_id)
);
RETURN OLD;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
CREATE TRIGGER on_membership_delete
AFTER DELETE ON memberships
FOR EACH ROW EXECUTE FUNCTION auth.revoke_permissions_on_membership_delete();
Edge Case: What happens when a new project or task is created? It also needs to have permissions granted to all existing members of its parent organization. This requires additional triggers on projects and tasks.
-- Trigger function for new projects
CREATE OR REPLACE FUNCTION auth.grant_permissions_on_project_insert() RETURNS TRIGGER AS $$
BEGIN
INSERT INTO auth.user_permissions (user_id, entity_id, entity_type, role)
SELECT m.user_id, NEW.id, 'project', m.role
FROM memberships m
WHERE m.organization_id = NEW.organization_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
CREATE TRIGGER on_project_insert
AFTER INSERT ON projects
FOR EACH ROW EXECUTE FUNCTION auth.grant_permissions_on_project_insert();
-- A similar trigger would be needed for the `tasks` table.
Rewriting RLS Policies for Performance
With the user_permissions cache in place, our RLS policies become incredibly simple and fast. They are just a simple EXISTS check against the pre-computed table.
-- A single, fast, reusable function
CREATE OR REPLACE FUNCTION auth.is_allowed(entity_id UUID) RETURNS BOOLEAN AS $$
SELECT EXISTS (
SELECT 1
FROM auth.user_permissions
WHERE user_permissions.entity_id = is_allowed.entity_id
AND user_permissions.user_id = auth.current_user_id()
);
$$ LANGUAGE sql STABLE SECURITY INVOKER;
-- Drop the old, slow policies
DROP POLICY IF EXISTS select_tasks ON tasks;
DROP POLICY IF EXISTS select_projects ON projects;
-- ...and so on
-- Create new, high-performance policies
CREATE POLICY select_tasks ON tasks FOR SELECT USING (auth.is_allowed(id));
CREATE POLICY select_projects ON projects FOR SELECT USING (auth.is_allowed(id));
CREATE POLICY select_organizations ON organizations FOR SELECT USING (auth.is_allowed(id));
Performance Analysis: The Payoff
Let's revisit our query: SELECT * FROM tasks WHERE completed = false;
Expected EXPLAIN Output Snippet:
Nested Loop (cost=...)
-> Index Scan using tasks_completed_idx on tasks (cost=...)
Index Cond: (completed = false)
-> Index Scan using idx_user_permissions_lookup on user_permissions (cost=...)
Index Cond: ((entity_id = tasks.id) AND (user_id = auth.current_user_id()))
Planning Time: ...
Execution Time: 45.123 ms
The difference is staggering. The query planner can now use the index on tasks(completed) to find candidate rows, and for each row, it performs a highly efficient indexed lookup on user_permissions. The execution time drops from seconds to milliseconds. This is the power of moving the computational complexity from read-time to write-time.
Handling Advanced Scenarios and Edge Cases
Real-world systems are never this clean. Here's how to handle common complexities.
1. Superuser/Admin Bypass
Internal support tools or system administrators often need to bypass RLS. You can achieve this with a separate policy and a custom setting.
-- In your session setup for an admin user:
SET LOCAL app.is_admin = 'true';
-- Helper function
CREATE OR REPLACE FUNCTION auth.is_admin_user() RETURNS BOOLEAN AS $$
SELECT current_setting('app.is_admin', true) = 'true';
$$ LANGUAGE sql STABLE;
-- An admin policy that comes first
CREATE POLICY admin_bypass_tasks ON tasks
FOR ALL
USING (auth.is_admin_user())
WITH CHECK (auth.is_admin_user());
-- Your existing policy remains for regular users
CREATE POLICY select_tasks ON tasks FOR SELECT USING (auth.is_allowed(id));
PostgreSQL applies policies with an OR condition. If admin_bypass_tasks evaluates to true, the user gets access, and the second policy check is effectively skipped for that user.
2. WITH CHECK Policies for Writes (INSERT, UPDATE)
A SELECT policy (USING) prevents users from seeing data. A WITH CHECK policy prevents them from writing data that would then become invisible to them.
For example, preventing a user from moving a task to a project they don't have access to:
CREATE POLICY update_tasks ON tasks
FOR UPDATE
USING (auth.is_allowed(id)) -- Can you see the task now?
WITH CHECK (auth.is_allowed(project_id)); -- After update, will you be able to see its new project?
This ensures data integrity and prevents users from orphaning data into a part of the hierarchy they cannot access.
3. Testing and Debugging RLS
Testing RLS is notoriously difficult. The key is to run tests as a specific, non-privileged user.
-- Example test setup in SQL (e.g., using a framework like pgTAP)
-- 1. Create test data
INSERT INTO tenants (id, name) VALUES ('t1', 'Test Tenant');
INSERT INTO users (id, email) VALUES ('u1', '[email protected]');
INSERT INTO organizations (id, tenant_id, name) VALUES ('o1', 't1', 'Test Org');
INSERT INTO projects (id, organization_id, name) VALUES ('p1', 'o1', 'Project A');
INSERT INTO tasks (id, project_id, title) VALUES ('task1', 'p1', 'My Task');
-- 2. Create a test role
CREATE ROLE test_runner LOGIN;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO test_runner;
-- 3. Run the test inside a transaction
BEGIN;
-- 4. Impersonate the test role
SET ROLE test_runner;
-- 5. Set the RLS context
SET LOCAL app.current_user_id = 'u1';
-- 6. Run assertions
-- This should return 0 rows because no membership exists yet
SELECT count(*) FROM tasks WHERE id = 'task1'; -- Assert equals 0
-- 7. Add membership (this will trigger permission grants)
-- Need to switch back to a superuser to insert into memberships if it also has RLS
RESET ROLE;
INSERT INTO memberships (user_id, organization_id, role) VALUES ('u1', 'o1', 'member');
SET ROLE test_runner;
SET LOCAL app.current_user_id = 'u1'; -- Re-set context
-- 8. Re-run assertion
-- Now this should return 1 row
SELECT count(*) FROM tasks WHERE id = 'task1'; -- Assert equals 1
-- 9. Cleanup
ROLLBACK;
Conclusion: A Conscious Architectural Trade-off
We've explored two powerful but fundamentally different patterns for implementing hierarchical RLS in PostgreSQL.
* Recursive CTEs (Dynamic Check): Best for write-heavy systems, rapidly changing hierarchies, or tables with few rows. The implementation is simpler, but it does not scale for read-heavy workloads.
* Permissions Cache (Pre-computation): The superior choice for most read-heavy SaaS applications. It offers phenomenal read performance at the cost of increased database complexity (triggers, background jobs) and a slight write overhead.
Choosing the right pattern is a critical architectural decision. By moving authorization logic from the application to the database, you create a single source of truth that is consistent, secure, and, when implemented with care using patterns like the permissions cache, exceptionally performant. This is not a premature optimization; it is a foundational design choice for building scalable, secure multi-tenant systems.