Advanced RLS Policies in PostgreSQL for Hierarchical Multi-Tenancy
Beyond Flat Multi-Tenancy: The Hierarchical Security Challenge
In modern SaaS applications, multi-tenancy is a foundational architectural requirement. The most common implementation involves a simple tenant_id column on every relevant table, with Row-Level Security (RLS) policies enforcing isolation: USING (tenant_id = current_setting('app.tenant_id')). This pattern is effective for flat tenancy models where tenants are isolated peers.
However, this approach shatters when confronted with hierarchical tenancy structures. Consider a system where an organization has multiple teams, and each team has multiple projects. A user, who is a member of a specific team, might need read access to all projects within their parent organization, but write access only to projects within their immediate team. A simple tenant_id check is insufficient to model these cascading, granular permissions.
This article dissects a robust pattern for implementing RLS in such a hierarchical environment. We will not cover the basics of CREATE POLICY. Instead, we will focus on the advanced techniques required to make this model secure, performant, and maintainable in a production system. We'll build a solution from the ground up, starting with a secure context, implementing the core recursive logic, and then diving deep into performance optimization and edge case management.
Defining the Hierarchical Schema
Let's establish our data model. We'll have organizations, teams (which belong to an organization), and projects (which belong to a team). Users are linked directly to teams.
-- Organizations are the top-level tenants.
CREATE TABLE organizations (
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
name TEXT NOT NULL
);
-- Teams belong to an organization.
CREATE TABLE teams (
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
organization_id BIGINT NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
name TEXT NOT NULL
);
-- Projects belong to a team, and therefore transitively to an organization.
CREATE TABLE projects (
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
team_id BIGINT NOT NULL REFERENCES teams(id) ON DELETE CASCADE,
name TEXT NOT NULL,
-- We'll denormalize organization_id for easier policy creation.
organization_id BIGINT NOT NULL REFERENCES organizations(id) ON DELETE CASCADE
);
-- Users are members of a team.
CREATE TABLE users (
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
team_id BIGINT NOT NULL REFERENCES teams(id) ON DELETE CASCADE,
email TEXT NOT NULL UNIQUE
);
-- A simple table to demonstrate RLS on project-related data.
CREATE TABLE tasks (
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
project_id BIGINT NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
description TEXT NOT NULL,
completed BOOLEAN NOT NULL DEFAULT false
);
-- Create indexes on foreign keys for performance.
CREATE INDEX ON teams (organization_id);
CREATE INDEX ON projects (team_id);
CREATE INDEX ON projects (organization_id);
CREATE INDEX ON users (team_id);
CREATE INDEX ON tasks (project_id);
Our goal is to create RLS policies that allow:
projects and tasks within their entire organization.projects and tasks belonging to their specific team.Section 1: Securely Setting User Context
Before we can write policies, we need a secure and reliable way to inform PostgreSQL about the identity of the user executing a query. The standard mechanism for this is current_setting(), which reads transaction-scoped configuration variables. The key is to set these variables in a controlled manner.
Never trust the client to set these values directly. A compromised API layer could allow a user to set app.user_id to another value and impersonate them. The context must be set by a trusted role, typically from a backend API middleware immediately after authentication.
Let's create a dedicated schema and a SECURITY DEFINER function to encapsulate this logic. SECURITY DEFINER ensures the function executes with the privileges of the user who defined it (the owner), not the user who calls it.
-- A private schema to hold our security helper functions.
CREATE SCHEMA app_private;
CREATE OR REPLACE FUNCTION app_private.set_current_user_context(user_id BIGINT)
RETURNS VOID AS $$
DECLARE
user_team_id BIGINT;
user_org_id BIGINT;
BEGIN
-- Find the user's team and organization.
SELECT u.team_id, t.organization_id
INTO user_team_id, user_org_id
FROM users u
JOIN teams t ON u.team_id = t.id
WHERE u.id = user_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'User not found: %', user_id;
END IF;
-- Set transaction-local settings. These are secure and isolated.
-- The 'true' flag indicates that an error should not be thrown if the setting doesn't exist.
PERFORM set_config('app.current_user_id', user_id::TEXT, true);
PERFORM set_config('app.current_team_id', user_team_id::TEXT, true);
PERFORM set_config('app.current_org_id', user_org_id::TEXT, true);
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
-- Revoke execute from public to ensure only trusted roles can call this.
REVOKE EXECUTE ON FUNCTION app_private.set_current_user_context(BIGINT) FROM PUBLIC;
In your application's middleware (e.g., in a Node.js Express server), after validating a JWT, you would execute this function at the beginning of each request's database transaction:
// Example middleware in a Node.js application
async function setUserContext(req, res, next) {
if (req.user) { // Assuming req.user is populated from a validated JWT
const client = await pool.connect();
try {
// BEGIN transaction implicitly handled by some libraries
await client.query('SELECT app_private.set_current_user_context($1)', [req.user.id]);
res.locals.dbClient = client; // Attach client to request for use in handlers
next();
} catch (err) {
client.release();
next(err);
}
} else {
next();
}
}
This pattern ensures that for the duration of the database transaction serving a single API request, PostgreSQL is aware of the authenticated user's full hierarchical context (user_id, team_id, org_id).
Section 2: The Core Pattern: Helper Functions in Policies
Embedding complex logic directly into every RLS policy is a maintenance nightmare. A far better approach is to encapsulate the permission logic within helper functions and call these functions from the policies. This keeps the policies clean and the logic centralized.
Our primary challenge is determining if a given resource (like a project) is accessible to the current user. For SELECT grants, this means checking if the project's organization matches the user's organization. For INSERT/UPDATE/DELETE, it means checking if the project's team matches the user's team.
Let's create these helper functions.
-- A public schema for functions that can be safely used in RLS policies.
CREATE SCHEMA app_public;
-- Helper to get the current user ID, with casting.
CREATE OR REPLACE FUNCTION app_public.current_user_id() RETURNS BIGINT AS $$
SELECT nullif(current_setting('app.current_user_id', true), '')::BIGINT;
$$ LANGUAGE sql STABLE;
-- Helper to get the current team ID.
CREATE OR REPLACE FUNCTION app_public.current_team_id() RETURNS BIGINT AS $$
SELECT nullif(current_setting('app.current_team_id', true), '')::BIGINT;
$$ LANGUAGE sql STABLE;
-- Helper to get the current organization ID.
CREATE OR REPLACE FUNCTION app_public.current_org_id() RETURNS BIGINT AS $$
SELECT nullif(current_setting('app.current_org_id', true), '')::BIGINT;
$$ LANGUAGE sql STABLE;
-- The core logic for read access: Is the target entity within the user's organization?
CREATE OR REPLACE FUNCTION app_public.is_in_my_organization(target_organization_id BIGINT) RETURNS BOOLEAN AS $$
SELECT target_organization_id = app_public.current_org_id();
$$ LANGUAGE sql STABLE;
-- The core logic for write access: Is the target entity within the user's immediate team?
CREATE OR REPLACE FUNCTION app_public.is_in_my_team(target_team_id BIGINT) RETURNS BOOLEAN AS $$
SELECT target_team_id = app_public.current_team_id();
$$ LANGUAGE sql STABLE;
Why STABLE? We mark these functions as STABLE to inform the query planner that they return the same result for all rows within a single statement. This allows for potential optimizations, as the planner knows it doesn't need to re-evaluate the function for every single row being checked.
Section 3: Implementing Granular RLS Policies
With our context-setting mechanism and helper functions in place, we can now define the RLS policies. The key is to use different rules for different commands (SELECT vs. INSERT/UPDATE/DELETE).
First, enable RLS on the tables.
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
Now, create the policies for the projects table.
-- Policy for SELECT access: Users can see all projects in their organization.
CREATE POLICY select_projects ON projects
FOR SELECT
USING (app_public.is_in_my_organization(organization_id));
-- Policy for INSERT access: Users can only create projects for their own team.
CREATE POLICY insert_projects ON projects
FOR INSERT
WITH CHECK (app_public.is_in_my_team(team_id));
-- Policy for UPDATE access: Users can only update projects within their own team.
CREATE POLICY update_projects ON projects
FOR UPDATE
USING (app_public.is_in_my_team(team_id));
-- Policy for DELETE access: Users can only delete projects from their own team.
CREATE POLICY delete_projects ON projects
FOR DELETE
USING (app_public.is_in_my_team(team_id));
Notice the critical difference between USING and WITH CHECK:
* USING (expression): This applies to existing rows. For SELECT, UPDATE, and DELETE, it determines which rows are visible or targetable by the command.
* WITH CHECK (expression): This applies to new or updated rows. For INSERT and UPDATE, it ensures that the data being written conforms to the policy. An attempt to insert a row that fails the WITH CHECK will result in an error.
Let's do the same for the tasks table. Since tasks don't have a direct organization_id, we must perform a JOIN within the policy. This is a common pattern, but it has performance implications we'll address later.
CREATE POLICY select_tasks ON tasks
FOR SELECT
USING (
EXISTS (
SELECT 1
FROM projects p
WHERE p.id = tasks.project_id
AND app_public.is_in_my_organization(p.organization_id)
)
);
CREATE POLICY insert_tasks ON tasks
FOR INSERT
WITH CHECK (
EXISTS (
SELECT 1
FROM projects p
WHERE p.id = tasks.project_id
AND app_public.is_in_my_team(p.team_id)
)
);
-- For UPDATE and DELETE, we combine the logic.
CREATE POLICY modify_tasks ON tasks
FOR ALL -- Applies to UPDATE and DELETE, as well as SELECT/INSERT if others not defined
USING (
EXISTS (
SELECT 1
FROM projects p
WHERE p.id = tasks.project_id
AND app_public.is_in_my_team(p.team_id)
)
)
WITH CHECK (
EXISTS (
SELECT 1
FROM projects p
WHERE p.id = tasks.project_id
AND app_public.is_in_my_team(p.team_id)
)
);
This setup works, but the EXISTS subquery in the tasks policy can be a performance drag. Every query on tasks now requires an additional index scan or seek on the projects table.
Section 4: Performance Optimization for Deep Hierarchies
The simple model above works well for shallow hierarchies. But what if our model was Organizations -> Business Units -> Departments -> Teams -> Projects? A simple JOIN becomes a chain of JOINs, and performance degrades rapidly. The real challenge in production RLS is not defining the logic, but ensuring it doesn't kill your database performance.
Let's explore two advanced optimization patterns.
Optimization 1: Recursive CTEs for Dynamic Hierarchy Traversal
What if a manager at the organization level should have access, but also a specific team lead from a different branch of the hierarchy needs one-off access to a project? Our model is too rigid. A more flexible approach uses a proper hierarchy table and a recursive query to find all accessible nodes for a user.
Let's adjust our schema to be more generic, using a parent-child relationship.
-- A more generic hierarchy model
CREATE TABLE tenants (
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
parent_id BIGINT REFERENCES tenants(id),
name TEXT NOT NULL
);
CREATE INDEX ON tenants (parent_id);
CREATE TABLE user_tenant_memberships (
user_id BIGINT NOT NULL REFERENCES users(id),
tenant_id BIGINT NOT NULL REFERENCES tenants(id),
PRIMARY KEY (user_id, tenant_id)
);
-- Our core helper function, now using recursion
CREATE OR REPLACE FUNCTION app_public.is_descendant_or_self(ancestor_id BIGINT, descendant_id BIGINT)
RETURNS BOOLEAN AS $$
BEGIN
-- Fast path for direct equality
IF ancestor_id = descendant_id THEN
RETURN TRUE;
END IF;
RETURN EXISTS (
WITH RECURSIVE hierarchy AS (
SELECT id FROM tenants WHERE id = descendant_id
UNION ALL
SELECT t.parent_id FROM tenants t JOIN hierarchy h ON t.id = h.id WHERE t.parent_id IS NOT NULL
)
SELECT 1 FROM hierarchy WHERE id = ancestor_id
);
END;
$$ LANGUAGE plpgsql STABLE;
Our RLS policy on a resource table would then look like this:
-- Assuming 'resources' table has a 'tenant_id'
CREATE POLICY select_resources ON resources
FOR SELECT USING (
EXISTS (
SELECT 1
FROM user_tenant_memberships utm
WHERE utm.user_id = app_public.current_user_id()
AND app_public.is_descendant_or_self(utm.tenant_id, resources.tenant_id)
)
);
The Problem: Recursive CTEs are powerful but can be slow. For each row being checked by the RLS policy, PostgreSQL may have to execute this recursive query. On a large table, this is a performance disaster. The planner is often unable to optimize this pattern effectively.
Optimization 2: Materialized Paths and Pre-calculated Permissions (The Production Pattern)
For high-throughput systems, re-calculating the hierarchy on every read is unacceptable. The solution is to denormalize the hierarchy information in a way that allows for extremely fast lookups.
Method A: Materialized Paths (ltree)
PostgreSQL's ltree extension is purpose-built for this. We store the entire ancestral path for each node in the hierarchy.
CREATE EXTENSION IF NOT EXISTS ltree;
ALTER TABLE tenants ADD COLUMN path LTREE;
-- We'd need a trigger to maintain this path on insert/update
CREATE OR REPLACE FUNCTION update_tenant_path() RETURNS TRIGGER AS $$
BEGIN
IF NEW.parent_id IS NULL THEN
NEW.path = NEW.id::text::ltree;
ELSE
SELECT path || NEW.id::text FROM tenants WHERE id = NEW.parent_id INTO NEW.path;
IF NOT FOUND THEN
RAISE EXCEPTION 'Invalid parent_id %', NEW.parent_id;
END IF;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER tenant_path_trigger
BEFORE INSERT OR UPDATE ON tenants
FOR EACH ROW EXECUTE FUNCTION update_tenant_path();
-- Create a GiST index for fast path lookups
CREATE INDEX tenant_path_idx ON tenants USING gist (path);
Now, our permission check becomes a simple, indexable ltree operation:
CREATE OR REPLACE FUNCTION app_public.is_descendant_or_self_ltree(ancestor_id BIGINT, descendant_id BIGINT)
RETURNS BOOLEAN AS $$
DECLARE
ancestor_path LTREE;
descendant_path LTREE;
BEGIN
SELECT path INTO ancestor_path FROM tenants WHERE id = ancestor_id;
SELECT path INTO descendant_path FROM tenants WHERE id = descendant_id;
-- The '<@' operator means 'is ancestor of or equal to'
RETURN ancestor_path <@ descendant_path;
END;
$$ LANGUAGE plpgsql STABLE;
This is dramatically faster than the recursive CTE.
Method B: Pre-calculating Permission Sets
For the absolute highest read performance, we can pre-calculate the entire set of accessible tenants for each user and store it in a dedicated table.
CREATE TABLE user_accessible_tenants (
user_id BIGINT NOT NULL,
tenant_id BIGINT NOT NULL,
PRIMARY KEY (user_id, tenant_id)
);
-- This table would be populated by a trigger on 'tenants' and 'user_tenant_memberships'
-- The trigger would calculate all descendants of a user's assigned tenants and insert them.
Our RLS policy then becomes blindingly fast:
CREATE POLICY select_resources_fast ON resources
FOR SELECT USING (
EXISTS (
SELECT 1
FROM user_accessible_tenants uat
WHERE uat.user_id = app_public.current_user_id()
AND uat.tenant_id = resources.tenant_id
)
);
This is just a simple JOIN against a pre-computed lookup table. The trade-off is increased write complexity and storage overhead. For read-heavy workloads, this is often the winning pattern.
Section 5: Advanced Edge Cases and Production Considerations
Implementing the core logic is only half the battle. Production systems present numerous edge cases.
Super-Admin / System Access
How do you bypass RLS for administrative tasks, migrations, or background jobs? There are two primary approaches:
BYPASSRLS Attribute (The Big Hammer): You can grant a role the BYPASSRLS attribute (ALTER ROLE my_admin_role BYPASSRLS;). Any user connected with this role will ignore all RLS policies. This is simple but dangerous. A single compromised connection with this role exposes all data.session_replication_role (The Surgical Tool): A safer method for migrations or bulk updates is to set session_replication_role = 'replica'. This disables not only RLS but also triggers and foreign key checks. It's powerful but must be used with extreme care and reset to 'origin' immediately after the operation.app.is_admin session variable. -- In set_current_user_context, check if the user is an admin and set the flag
-- PERFORM set_config('app.is_admin', 'true', true);
-- Modify the policy
CREATE POLICY select_projects ON projects FOR SELECT
USING (
(current_setting('app.is_admin', true) = 'true')
OR
(app_public.is_in_my_organization(organization_id))
);
This makes bypass an explicit and auditable part of the security model.
Testing and Debugging RLS
Debugging RLS can be maddening because it fails silently—queries simply return no rows. Here are some strategies:
* Impersonation: Use SET ROLE to run queries as a less-privileged application user to test your policies. Wrap it in a transaction to isolate the test.
BEGIN;
-- Set the context for the user you want to test
SELECT app_private.set_current_user_context(123);
-- Impersonate the role your application uses
SET ROLE my_application_role;
-- Run your test query
EXPLAIN ANALYZE SELECT * FROM tasks WHERE id = 456;
-- This will show you the query plan with the RLS policy applied.
ROLLBACK;
* pgTAP: Use a testing framework like pgTAP to write automated tests for your security functions and policies. You can create specific users and tenants, impersonate them, and assert that SELECT queries return the expected number of rows.
Performance Impact of `SECURITY DEFINER`
Functions marked SECURITY DEFINER have a performance cost. PostgreSQL must perform additional security checks and cannot inline these functions into the calling query. This is why our pattern uses a single SECURITY DEFINER function to set the context at the start of the transaction, while the per-row helper functions (is_in_my_organization, etc.) are simple sql functions (or plpgsql without SECURITY DEFINER). This allows the query planner to inline the RLS logic directly into the query plan for better performance, a critical micro-optimization that has a large impact at scale.
Conclusion: RLS as a Centralized Security Pillar
Implementing Row-Level Security for hierarchical multi-tenancy is a significant step up from simple tenant isolation. It shifts complex authorization logic from the application layer directly into the database, creating a single, authoritative source of truth for data access.
While the initial implementation is more complex, the benefits are substantial:
* Security: Authorization rules are consistently enforced, regardless of whether data is accessed via a primary API, a reporting tool, or a direct database connection.
* Maintainability: The permission logic is co-located with the data it protects, simplifying application code.
* Performance: By leveraging advanced patterns like materialized paths or pre-calculated permission sets, RLS can be made to perform exceptionally well even on large, complex hierarchies.
The key takeaway is to move beyond naive RLS implementations. A production-grade system requires a secure context-setting mechanism, centralized helper functions, and a proactive approach to performance optimization. By treating your RLS policies as a core part of your application's architecture, you can build a highly secure and scalable foundation for any multi-tenant application.