Postgres RLS for Hierarchical Data: A Session-Based Caching Pattern
The Challenge: Performant RLS in Deeply Nested Hierarchies
Row-Level Security (RLS) is a cornerstone of modern data security in PostgreSQL, allowing for fine-grained control over which rows a user can access. For simple tenancy models—where you can filter by a tenant_id or user_id—the implementation is straightforward. However, for applications with hierarchical data structures, such as organizational charts, project management tools with nested tasks, or content management systems with parent-child relationships, the challenge becomes significantly more complex.
The core requirement is often this: a user should be able to access their own data and the data of all their direct and indirect subordinates. A naive implementation of this logic inside an RLS policy function, typically involving a recursive Common Table Expression (CTE), leads to a catastrophic performance bottleneck. The database is forced to re-calculate the entire user hierarchy for every single row it evaluates against the policy. This pattern does not scale and will quickly bring a production system to its knees.
This article presents a production-proven architectural pattern to overcome this limitation. We will decouple the expensive, once-per-session hierarchy calculation from the cheap, per-row security check by leveraging PostgreSQL's session configuration parameters. This approach maintains robust security while delivering performance that is orders of magnitude better than naive recursive policies.
Target Scenario: A Multi-Level Corporate Reporting System
To ground our discussion, let's model a corporate structure. We have employees, and each employee (except the CEO) reports to a manager. We also have a table of sensitive financial reports, where each report is authored by a specific employee. The security rules are:
- Users can only see reports they authored.
- Managers can see reports authored by themselves and any employee in their reporting chain (direct or indirect subordinates).
auditor role that must have access to all reports, bypassing the hierarchy.Let's begin by defining the schema and populating it with a representative organizational structure.
-- Clean up previous structures if they exist
DROP TABLE IF EXISTS sensitive_reports, employees CASCADE;
DROP ROLE IF EXISTS app_user, auditor;
-- Create roles for our application
CREATE ROLE app_user;
CREATE ROLE auditor BYPASSRLS;
-- The employees table with a self-referencing key for the hierarchy
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE,
manager_id INTEGER REFERENCES employees(id)
);
-- Grant basic permissions to the application role
GRANT SELECT, INSERT, UPDATE, DELETE ON employees TO app_user;
GRANT USAGE, SELECT ON SEQUENCE employees_id_seq TO app_user;
-- The table with sensitive data that needs to be protected by RLS
CREATE TABLE sensitive_reports (
id SERIAL PRIMARY KEY,
author_id INTEGER NOT NULL REFERENCES employees(id),
title TEXT NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Grant basic permissions to the application role
GRANT SELECT, INSERT, UPDATE, DELETE ON sensitive_reports TO app_user;
GRANT USAGE, SELECT ON SEQUENCE sensitive_reports_id_seq TO app_user;
-- Populate the tables with a sample org chart
-- Level 1: CEO
INSERT INTO employees (id, name, email, manager_id) VALUES (1, 'Alice (CEO)', '[email protected]', NULL);
-- Level 2: VPs reporting to CEO
INSERT INTO employees (id, name, email, manager_id) VALUES (2, 'Bob (VP Eng)', '[email protected]', 1);
INSERT INTO employees (id, name, email, manager_id) VALUES (3, 'Charlie (VP Sales)', '[email protected]', 1);
-- Level 3: Directors reporting to VPs
INSERT INTO employees (id, name, email, manager_id) VALUES (4, 'David (Eng Director)', '[email protected]', 2);
INSERT INTO employees (id, name, email, manager_id) VALUES (5, 'Eve (Sales Director)', '[email protected]', 3);
-- Level 4: Managers reporting to Directors
INSERT INTO employees (id, name, email, manager_id) VALUES (6, 'Frank (Eng Manager)', '[email protected]', 4);
INSERT INTO employees (id, name, email, manager_id) VALUES (7, 'Grace (Sales Manager)', '[email protected]', 5);
-- Level 5: Individual Contributors
INSERT INTO employees (id, name, email, manager_id) VALUES (8, 'Heidi (Engineer)', '[email protected]', 6);
INSERT INTO employees (id, name, email, manager_id) VALUES (9, 'Ivan (Engineer)', '[email protected]', 6);
INSERT INTO employees (id, name, email, manager_id) VALUES (10, 'Judy (Account Exec)', '[email protected]', 7);
-- Populate some sensitive reports
INSERT INTO sensitive_reports (author_id, title, content) VALUES
(1, 'Q4 Global Strategy', '...'),
(2, 'Engineering Roadmap 2025', '...'),
(4, 'Platform Infrastructure Costs', '...'),
(6, 'Microservice A Performance Review', '...'),
(8, 'Bugfix Report #1234', '...'),
(9, 'Feature Spike: New Caching Layer', '...'),
(3, 'Q4 Sales Projections', '...'),
(5, 'Regional Sales Performance', '...'),
(7, 'Key Account Renewals', '...'),
(10, 'New Lead Funnel Analysis', '...');
ALTER TABLE sensitive_reports ENABLE ROW LEVEL SECURITY;
-- For testing, create users corresponding to employees
-- In a real app, you'd map your application users to these roles or JWT claims
CREATE USER "[email protected]" WITH PASSWORD 'password';
GRANT app_user TO "[email protected]";
CREATE USER "[email protected]" WITH PASSWORD 'password';
GRANT app_user TO "[email protected]";
CREATE USER "[email protected]" WITH PASSWORD 'password';
GRANT app_user TO "[email protected]";
CREATE USER auditor_user WITH PASSWORD 'password';
GRANT auditor TO auditor_user;
The Anti-Pattern: Recursive Functions in RLS Policies
A common first attempt is to define a SQL function that recursively finds all subordinates for the current user and then use this function directly in the RLS policy. Let's build this anti-pattern to analyze its flaws.
First, the function to get all subordinates for a given employee ID:
CREATE OR REPLACE FUNCTION get_subordinates(p_manager_id INTEGER)
RETURNS TABLE(employee_id INTEGER) AS $$
BEGIN
RETURN QUERY
WITH RECURSIVE subordinates_cte AS (
SELECT id FROM employees WHERE id = p_manager_id
UNION ALL
SELECT e.id
FROM employees e
JOIN subordinates_cte s ON e.manager_id = s.id
)
SELECT id FROM subordinates_cte;
END;
$$ LANGUAGE plpgsql STABLE SECURITY DEFINER;
Now, we create the RLS policy on sensitive_reports using this function. We need a way to get the current user's employee ID. We'll use current_user to look up their ID in the employees table.
CREATE OR REPLACE FUNCTION get_current_employee_id() RETURNS INTEGER AS $$
DECLARE
v_employee_id INTEGER;
BEGIN
SELECT id INTO v_employee_id FROM employees WHERE email = current_user LIMIT 1;
RETURN v_employee_id;
END;
$$ LANGUAGE plpgsql STABLE;
CREATE POLICY select_reports_hierarchical_bad
ON sensitive_reports
FOR SELECT
USING (
author_id IN (SELECT employee_id FROM get_subordinates(get_current_employee_id()))
);
This seems correct, and for a single query, it will work. Let's test it:
-- Connect as Bob (VP Eng, ID 2)
-- He should see his own reports, plus those of David(4), Frank(6), Heidi(8), and Ivan(9)
SET ROLE "[email protected]";
SELECT id, author_id, title FROM sensitive_reports ORDER BY author_id;
-- Expected output: Reports from authors 2, 4, 6, 8, 9
-- Connect as Frank (Eng Manager, ID 6)
-- He should see his own reports, plus those of Heidi(8) and Ivan(9)
SET ROLE "[email protected]";
SELECT id, author_id, title FROM sensitive_reports ORDER BY author_id;
-- Expected output: Reports from authors 6, 8, 9
-- Connect as Heidi (Engineer, ID 8)
-- She should only see her own reports
SET ROLE "[email protected]";
SELECT id, author_id, title FROM sensitive_reports ORDER BY author_id;
-- Expected output: Report from author 8
RESET ROLE;
The logic is sound. The problem is performance. Let's analyze the query plan for Bob's query:
SET ROLE "[email protected]";
EXPLAIN ANALYZE SELECT id, author_id, title FROM sensitive_reports;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on sensitive_reports (cost=0.00..152.05 rows=3 width=44) (actual time=0.291..0.301 rows=5 loops=1)
Filter: (author_id = ANY ($0))
InitPlan 1 (returns $0)
-> Function Scan on get_subordinates (cost=0.25..12.75 rows=1000 width=4) (actual time=0.284..0.286 rows=5 loops=1)
Function Call: get_subordinates(get_current_employee_id())
Planning Time: 0.165 ms
Execution Time: 0.334 ms
The plan shows a Seq Scan on sensitive_reports with a Filter. For each row, it checks if author_id is in the array returned by our function. The critical issue, not immediately obvious from this simple plan, is that the RLS policy function can be invoked many times during more complex queries involving joins or subqueries. The planner may not always be ableto de-correlate the subquery. On a large table, this becomes a disaster. Imagine a table with 10 million reports. The get_subordinates function would be executed repeatedly, each time performing a recursive query against the employees table. This is the definition of a non-scalable architecture.
The Solution: Session-Level Hierarchy Caching
The performant pattern involves a fundamental shift in thinking: calculate the hierarchy once per session, cache the result, and make the RLS policy a simple, cheap check against this cache.
PostgreSQL provides a mechanism for storing arbitrary key-value pairs within a session's scope using current_setting(), set_config(). We can use this to store an array of accessible employee IDs for the duration of a user's connection.
Step 1: Create a Context-Setting Function
First, we need a function that the application will call immediately after a user authenticates and establishes a connection. This function will perform the expensive recursive query and store the results in a custom session variable. We'll namespace our variable with app. (e.g., app.accessible_employee_ids) to avoid collisions.
This function must run with elevated privileges (SECURITY DEFINER) because the connected application user (app_user) should not have direct read access to the entire employees table, only the parts exposed via their own RLS policies (which don't exist yet). This is a critical security consideration.
CREATE OR REPLACE FUNCTION set_user_context(p_employee_id INTEGER)
RETURNS void AS $$
DECLARE
v_accessible_ids INTEGER[];
BEGIN
-- Use a recursive CTE to find all subordinates, including the user themselves.
WITH RECURSIVE subordinates_cte AS (
SELECT id FROM employees WHERE id = p_employee_id
UNION ALL
SELECT e.id
FROM employees e
JOIN subordinates_cte s ON e.manager_id = s.id
)
SELECT array_agg(id) INTO v_accessible_ids FROM subordinates_cte;
-- Store the array of IDs in a custom session variable.
-- The second argument 'false' means this setting is not local to the current transaction.
-- It will persist for the entire session.
PERFORM set_config('app.accessible_employee_ids', array_to_string(v_accessible_ids, ','), false);
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
-- IMPORTANT: Revoke default execute permission from PUBLIC
REVOKE EXECUTE ON FUNCTION set_user_context(INTEGER) FROM PUBLIC;
-- Grant permission only to our application role. This is a crucial security step.
GRANT EXECUTE ON FUNCTION set_user_context(INTEGER) TO app_user;
Security Note on SECURITY DEFINER: Functions with SECURITY DEFINER run as the user who defined them (typically a superuser). This is powerful but dangerous. Always REVOKE default EXECUTE permissions from PUBLIC and grant them only to trusted roles. Additionally, set a secure search_path for the function (ALTER FUNCTION ... SET search_path = ...) to prevent hijacking by malicious objects in other schemas.
Step 2: Create a High-Performance RLS Policy
Now we can drop our old, slow policy and create a new one. This new policy will be incredibly simple. It retrieves the cached IDs from the session variable and performs a fast array lookup.
-- First, remove the old, inefficient policy
DROP POLICY IF EXISTS select_reports_hierarchical_bad ON sensitive_reports;
-- A helper function to make the policy cleaner
CREATE OR REPLACE FUNCTION get_accessible_employee_ids()
RETURNS INTEGER[] AS $$
DECLARE
v_ids_string TEXT;
v_ids_array INTEGER[];
BEGIN
-- Retrieve the setting. The 'true' argument means it's okay if the setting doesn't exist (returns NULL).
v_ids_string := current_setting('app.accessible_employee_ids', true);
IF v_ids_string IS NULL OR v_ids_string = '' THEN
-- If no context is set, the user can see nothing.
-- This is a secure default.
RETURN ARRAY[]::INTEGER[];
ELSE
v_ids_array := string_to_array(v_ids_string, ',')::INTEGER[];
RETURN v_ids_array;
END IF;
END;
$$ LANGUAGE plpgsql STABLE;
-- The new, high-performance policy
CREATE POLICY select_reports_hierarchical_good
ON sensitive_reports
FOR SELECT
USING (
author_id = ANY(get_accessible_employee_ids())
);
This policy is a game-changer. The get_accessible_employee_ids() function does no recursion. It's a trivial lookup of a session variable and a string-to-array conversion. This operation is extremely fast and has a constant time complexity regardless of the size of the hierarchy or the sensitive_reports table.
Step 3: Application Integration and Testing
In your application's connection pool logic, after a user authenticates, you must make two calls:
users table or by their email).SELECT set_user_context(:employee_id); to prime the session.Let's simulate this flow in psql:
-- Connect as Bob (ID 2)
SET ROLE "[email protected]";
-- **APPLICATION STEP**: The app backend calls this function after authentication.
-- Bob's ID is 2.
SELECT set_user_context(2);
-- Now, run the query. The RLS policy will use the cached context.
SELECT id, author_id, title FROM sensitive_reports ORDER BY author_id;
-- Correctly returns reports from authors 2, 4, 6, 8, 9
-- Analyze the performance
EXPLAIN ANALYZE SELECT id, author_id, title FROM sensitive_reports;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on sensitive_reports (cost=0.00..1.12 rows=3 width=44) (actual time=0.035..0.039 rows=5 loops=1)
Filter: (author_id = ANY ('{2,4,6,8,9}'::integer[]))
Planning Time: 0.221 ms
Execution Time: 0.054 ms
The difference is subtle but profound. The query plan now shows a simple filter against a constant array. There is no Function Scan and no hidden recursive query. The planner sees author_id = ANY ('{2,4,6,8,9}'::integer[]). This is incredibly efficient and allows the planner to make much better decisions, such as using an index on author_id if one were available and deemed beneficial.
Advanced Scenarios and Production Considerations
This pattern is robust, but real-world systems require handling more complex cases.
Handling Write Operations (INSERT, UPDATE, DELETE)
Our policy was only for SELECT. We need policies for write operations to maintain data integrity.
* INSERT: An employee should only be able to create reports for themselves.
* UPDATE/DELETE: A manager should only be able to modify reports within their hierarchy.
-- Policy for creating new reports
CREATE POLICY insert_reports_self_only
ON sensitive_reports
FOR INSERT
WITH CHECK (
-- An employee can only create a report with their own ID as the author.
author_id = (SELECT id FROM employees WHERE email = current_user LIMIT 1)
);
-- Policy for modifying existing reports
-- This can reuse our high-performance function
CREATE POLICY update_delete_reports_hierarchical
ON sensitive_reports
FOR UPDATE, DELETE
USING (
author_id = ANY(get_accessible_employee_ids())
);
Notice the use of WITH CHECK for INSERT. This clause is evaluated for new rows being inserted, ensuring they conform to the policy. The USING clause applies to existing rows for UPDATE and DELETE operations.
The Super-User / Auditor Case
How do we handle users who need to see everything? There are two primary approaches:
BYPASSRLS Role Attribute: This is the simplest and most secure method. Create a specific role (e.g., auditor) and grant it the BYPASSRLS attribute. Any user in this role will completely ignore all RLS policies. -- We already created this role at the beginning
-- CREATE ROLE auditor BYPASSRLS;
-- CREATE USER auditor_user WITH PASSWORD '...';
-- GRANT auditor TO auditor_user;
-- Now, when connecting as auditor_user:
SET ROLE auditor_user;
SELECT count(*) FROM sensitive_reports; -- Returns 10, all reports are visible.
RESET ROLE;
set_user_context function. For example, you could have a special employee_id (e.g., 0) that signifies a super-user, and the function would then set a special flag in the session. -- Modified context function (conceptual)
-- ...
-- IF p_employee_id = 0 THEN
-- PERFORM set_config('app.is_superuser', 'true', false);
-- ELSE
-- -- ... normal hierarchy calculation
-- END IF;
-- ...
-- Modified RLS policy (conceptual)
-- ... USING (
-- current_setting('app.is_superuser', true) = 'true'
-- OR author_id = ANY(get_accessible_employee_ids())
-- )
The BYPASSRLS attribute is strongly preferred for its clarity, security, and performance, as it avoids adding complexity to every RLS check.
Context Invalidation and Stale Hierarchies
Our session cache is a snapshot of the hierarchy at the time set_user_context was called. What happens if an employee's manager changes mid-session? The cache becomes stale. The user's permissions will not reflect the organizational change until they start a new session and the context is re-calculated.
For most applications, this is an acceptable trade-off. Organizational changes are infrequent, and session-level consistency is often sufficient. However, if real-time updates are critical, you have a few (complex) options:
employees table, your application logic could actively terminate all sessions for affected users, forcing them to log in again.employees table that, upon an UPDATE to manager_id, attempts to notify connected backends. This is extremely complex to implement correctly, often involving LISTEN/NOTIFY and custom logic in your connection pooler or application to reset the context for specific connections. This approach is brittle and generally not recommended unless absolutely necessary.For 99% of use cases, accepting that permissions are session-consistent is the correct and pragmatic engineering decision.
Performance Benchmark: Naive vs. Cached
To quantify the performance difference, let's create a larger dataset and run a simple benchmark using pg_bench. We'll create a test function that mimics a user reading reports.
Benchmark Setup:
employees and sensitive_reports tables to thousands of rows.- Create two test functions: one that uses the naive RLS policy and one that uses the cached policy (after setting the context).
pg_bench to run these functions under load.While a full pg_bench script is beyond this article's scope, the conceptual difference can be seen with EXPLAIN ANALYZE on a larger dataset. If sensitive_reports had 1 million rows and employees had 10,000, the analysis would look like this:
Naive Recursive Policy: The execution time would be dominated by the repeated execution of the get_subordinates function. The total time would be (time_for_one_recursion) (number_of_rows_scanned), resulting in potentially seconds or even minutes per query.
* Session-Cached Policy: The set_user_context call would take a few milliseconds to run once. Every subsequent query would have a near-constant, sub-millisecond overhead for the RLS check. The query time would scale only with the size of the sensitive_reports table and the effectiveness of its indexes, which is the ideal scenario.
Conclusion
Implementing Row-Level Security for hierarchical data in PostgreSQL requires moving beyond simplistic, row-by-row calculations. The anti-pattern of using recursive functions directly within an RLS policy creates an unscalable system that will fail under production load.
The robust, performant, and secure pattern is to decouple hierarchy calculation from permission checking. By using a SECURITY DEFINER function to compute a user's data access scope once per session and caching it with set_config, we transform the RLS policy into a trivial, constant-time lookup.
This session-based caching pattern provides:
* High Performance: The expensive recursive query runs only once, not for every row.
* Strong Security: Policies default to denying access if the context is not set, and the use of SECURITY DEFINER is properly sandboxed.
* Scalability: The performance of RLS checks remains constant as the data tables grow.
While it requires more setup than a naive policy and careful application integration to set the context upon login, this architecture is a foundational pattern for building sophisticated, secure, and scalable multi-tenant applications on PostgreSQL.