PostgreSQL Partial Indexes for Multi-Tenant SaaS Query Optimization

October 13, 2025

14 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Illusion of the 'Optimized' Multi-Tenant Index

In any mature multi-tenant SaaS application, the WHERE tenant_id = $1 clause is the bedrock of data isolation and query performance. The first performance optimization any competent engineer reaches for is a compound B-tree index, typically (tenant_id, other_column). For a while, this works flawlessly. But as a system scales and data distributions become increasingly skewed, this seemingly optimal index becomes a source of insidious performance degradation.

Consider a common invoices table in a billing system:

sql

CREATE TABLE invoices (
    id BIGSERIAL PRIMARY KEY,
    tenant_id UUID NOT NULL,
    customer_id BIGINT NOT NULL,
    status TEXT NOT NULL, -- e.g., 'draft', 'due', 'paid', 'void'
    amount_cents INTEGER NOT NULL,
    due_date DATE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- The standard, 'good enough' index
CREATE INDEX idx_invoices_tenant_id_status ON invoices (tenant_id, status);

This index serves queries filtering by tenant and status well. But what happens when 98% of your invoices are in the paid state, 1.5% are draft, and a mere 0.5% are actively due? Your dashboard's critical "Overdue Invoices" widget executes a query like this:

sql

SELECT id, customer_id, amount_cents, due_date
FROM invoices
WHERE tenant_id = 'a1b2c3d4-e5f6-a7b8-c9d0-e1f2a3b4c5d6'
  AND status = 'due';

PostgreSQL will dutifully use idx_invoices_tenant_id_status. It will traverse the B-tree to the block for tenant_id = 'a1b2c3d4-...' and then scan through all the index entries for that tenant until it finds the ones where status = 'due'. The problem is that it has to sift through a vast number of paid and draft entries to find the few due ones. The index, while technically correct, is bloated with data irrelevant to this specific, high-priority query.

This is where senior-level database optimization diverges from standard practice. We must stop indexing for the general case and start creating highly specialized indexes for our most critical query patterns. Partial indexes are the precision tool for this job.

From Blunt Instrument to Surgical Tool: The Partial Index

A partial index is exactly what it sounds like: an index on a subset of a table's rows, defined by a WHERE clause. This allows us to create small, dense, and incredibly fast indexes for specific data slices.

Let's solve our due invoices problem. First, we drop the monolithic index and create a specialized one:

sql

-- We might keep the general index for other queries, but for this example, we'll replace it.
-- DROP INDEX idx_invoices_tenant_id_status;

-- The surgical approach
CREATE INDEX idx_invoices_tenant_due_status ON invoices (tenant_id, due_date)
WHERE status = 'due';

Notice two things:

The WHERE status = 'due' clause. This is the magic. Only rows matching this predicate will be included in the index.

We've included due_date in the index, as this is likely the next column we'd sort or filter by for overdue invoices.

Now, let's analyze the performance difference. We'll use EXPLAIN (ANALYZE, BUFFERS) to see what the database is actually doing. Assume a table with 50 million invoices for 10,000 tenants, where a single tenant has 5,000 invoices, but only 25 are due.

Before: Using the Standard Compound Index

sql

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, due_date FROM invoices
WHERE tenant_id = '...' AND status = 'due'
ORDER BY due_date ASC LIMIT 10;

text

Limit  (cost=0.56..5.45 rows=10 width=12) (actual time=0.850..0.855 ms rows=10 loops=1)
  Buffers: shared hit=25
  ->  Index Scan using idx_invoices_tenant_id_status on invoices  (cost=0.56..122.50 rows=250 width=12) (actual time=0.848..0.852 ms rows=10 loops=1)
        Index Cond: ((tenant_id = '...') AND (status = 'due'))
        Buffers: shared hit=25
Planning Time: 0.150 ms
Execution Time: 0.900 ms

The key metric here is Buffers: shared hit=25. The planner had to read 25 blocks from memory to find our 10 rows. This is because the due entries are scattered among the thousands of other entries for this tenant within the index.

After: Using the Partial Index

Now, PostgreSQL will automatically pick our new, specialized index for the same query.

text

Limit  (cost=0.42..1.85 rows=10 width=12) (actual time=0.035..0.040 ms rows=10 loops=1)
  Buffers: shared hit=4
  ->  Index Scan using idx_invoices_tenant_due_status on invoices  (cost=0.42..3.50 rows=25 width=12) (actual time=0.032..0.036 ms rows=10 loops=1)
        Index Cond: (tenant_id = '...')
        Buffers: shared hit=4
Planning Time: 0.120 ms
Execution Time: 0.055 ms

The results are dramatic:

* Execution Time: 0.900 ms -> 0.055 ms (~16x faster).

* Buffer Hits: 25 -> 4 (~6x fewer blocks read).

Why? The idx_invoices_tenant_due_status index is tiny. It contains only the 0.5% of invoices that are due. When the planner needs to find them, it navigates a much smaller, denser B-tree. The data it needs is packed together, leading to phenomenal cache efficiency.

The benefits extend beyond query speed:

* Storage: The partial index is a fraction of the size of the full index, saving disk space.

* Write Performance: INSERT, UPDATE, and DELETE operations only need to modify this index if the row's status is or becomes due. For the 99.5% of other invoices, this index is untouched, reducing write overhead.

Advanced Production Patterns for Partial Indexes

This technique goes far beyond simple status filtering. Here are three production-grade patterns we use to optimize our multi-tenant systems.

Pattern 1: Conditional Uniqueness for Active Records

A common requirement is that a user's email must be unique within their tenant, but only for active users. An inactive or archived user shouldn't prevent a new user from signing up with the same email. Implementing this in application logic is prone to race conditions. A partial UNIQUE index enforces this at the highest level of integrity.

sql

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    tenant_id UUID NOT NULL,
    email TEXT NOT NULL,
    is_active BOOLEAN NOT NULL DEFAULT true,
    -- other columns...
);

-- This ensures email is unique ONLY for active users within a tenant.
CREATE UNIQUE INDEX idx_users_unique_active_email_per_tenant
ON users (tenant_id, lower(email))
WHERE is_active IS TRUE;

Now, observe the behavior:

sql

-- Succeeds
INSERT INTO users (tenant_id, email, is_active) VALUES ('...', '[email protected]', true);

-- Fails with unique constraint violation (as expected)
INSERT INTO users (tenant_id, email, is_active) VALUES ('...', '[email protected]', true);

-- Deactivate the first user
UPDATE users SET is_active = false WHERE email = '[email protected]';

-- Now this succeeds! The unique constraint no longer applies to the first user.
INSERT INTO users (tenant_id, email, is_active) VALUES ('...', '[email protected]', true);

This pattern is incredibly powerful. It offloads complex business logic into the database, guaranteeing data integrity while simultaneously creating a hyper-efficient index for the most common query: looking up an active user by email.

Pattern 2: The Ubiquitous Soft-Delete

Virtually every SaaS application uses soft deletes (deleted_at IS NULL). This means that 99.9% of your queries should be filtering out deleted records. Indexing the deleted_at column itself is often an anti-pattern because of its low cardinality. A partial index is the canonical solution.

sql

CREATE TABLE projects (
    id BIGSERIAL PRIMARY KEY,
    tenant_id UUID NOT NULL,
    name TEXT NOT NULL,
    deleted_at TIMESTAMPTZ
);

-- An index on all non-deleted projects.
CREATE INDEX idx_projects_active_per_tenant
ON projects (tenant_id, name)
WHERE deleted_at IS NULL;

Any query that includes WHERE tenant_id = $1 AND deleted_at IS NULL will now use this compact and efficient index. The vast graveyard of soft-deleted records is completely ignored by the index, keeping it small and fast for day-to-day operations.

Pattern 3: Tier-Based and Feature-Flag Indexing (The Dangerous Edge Case)

Imagine a scenario where you have a feature—say, advanced analytics—that's only available to tenants on a premium plan. A background job runs frequently to process data for these tenants.

An initial, tempting idea is to create a partial index referencing the tenants table:

sql

-- DO NOT DO THIS IN PRODUCTION
CREATE INDEX idx_analytics_events_premium_tenants
ON analytics_events (tenant_id, created_at)
WHERE tenant_id IN (SELECT id FROM tenants WHERE plan = 'premium');

This is a trap. The subquery (SELECT id FROM tenants ...) is evaluated only once, at the time of index creation. The resulting list of tenant IDs is baked into the index definition. When new tenants upgrade to premium, they will not be added to this index. This index will silently become stale and incorrect.

This highlights a critical rule: the WHERE clause of a partial index cannot contain subqueries and can only reference columns from the table being indexed. Functions used must be marked as IMMUTABLE.

The correct way to implement this pattern is via denormalization. While often frowned upon, strategic denormalization for performance is a hallmark of advanced data modeling.

sql

-- Add plan_type directly to the high-volume table
ALTER TABLE analytics_events ADD COLUMN tenant_plan TEXT;

-- Keep it in sync with triggers or application logic

-- Now, create the correct partial index
CREATE INDEX idx_analytics_events_premium_plan
ON analytics_events (tenant_id, created_at)
WHERE tenant_plan = 'premium';

This index is now correct, dynamic, and highly effective. It creates a dedicated index just for the tenants that this specific background job cares about, drastically reducing the amount of data the job needs to scan.

The Planner's Mind: Caveats and Production Considerations

Partial indexes are not a silver bullet. You must understand how the PostgreSQL query planner thinks to use them effectively.

1. The Predicate Must Match

The planner will only consider a partial index if the query's WHERE clause logically implies the index's WHERE clause.

Given our index ... WHERE status = 'due', the following happens:

* ... WHERE status = 'due' - MATCH: The query predicate is identical.

* ... WHERE status = 'due' AND amount_cents > 5000 - MATCH: The query predicate is a more specific subset.

* ... WHERE status = 'paid' - NO MATCH: The predicates are contradictory.

* ... WHERE status IN ('due', 'overdue') - NO MATCH: The query predicate is broader than the index's. The planner cannot guarantee all required rows are in the index.

This means you may need multiple partial indexes for different statuses if they are all part of hot query paths. For example:

sql

CREATE INDEX idx_invoices_due_or_overdue ON invoices (tenant_id, due_date)
WHERE status IN ('due', 'overdue');

2. Prepared Statements and Parameterization

This is a subtle but critical edge case. When you use a prepared statement, PostgreSQL tries to create a generic query plan that can work for any parameter value.

Consider this query with a parameterized status:

sql

PREPARE my_query (uuid, text) AS
SELECT id FROM invoices WHERE tenant_id = $1 AND status = $2;

When you execute EXECUTE my_query('...', 'due'), the planner might not choose the partial index ... WHERE status = 'due'. Why? Because the generic plan must also work for EXECUTE my_query('...', 'paid'), for which the partial index is invalid. After a few executions (by default, 5), PostgreSQL may decide to re-plan based on the specific parameters, but you cannot rely on it.

If a query path is critical and needs to leverage a partial index, avoid parameterizing the column in the index's predicate. Construct the query string with the literal value. While this goes against the general advice to parameterize everything, it's a necessary trade-off for this specific optimization. Be absolutely sure to sanitize inputs to prevent SQL injection if you take this route.

3. Indexing NULLs

Standard B-tree indexes in PostgreSQL do store NULL values. This can be a source of bloat. If you have a column that is frequently NULL and you only ever query for non-NULL values, a partial index is a perfect fit.

sql

-- e.g., A task can be assigned to a user, or be unassigned (user_id IS NULL)
CREATE INDEX idx_tasks_assigned_user ON tasks (assigned_user_id)
WHERE assigned_user_id IS NOT NULL;

This creates an index only for assigned tasks, which might be a small fraction of the total, making lookups by assigned_user_id much faster.

Quantifying the Impact: Benchmarking Read and Write Performance

Talk is cheap. Let's benchmark the real-world impact. We'll set up a table with 10 million tasks, where 98% have a completed status, 1.5% are pending, and 0.5% are in_progress.

Test Schema:

sql

CREATE TABLE tasks (
    id BIGSERIAL PRIMARY KEY,
    tenant_id INT NOT NULL, -- Simplified for pgbench
    status TEXT NOT NULL,
    priority INT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Populate with 10M rows, skewed distribution

Scenario 1: The Standard Compound Index

CREATE INDEX idx_tasks_tenant_status ON tasks (tenant_id, status);

Scenario 2: The Partial Index

CREATE INDEX idx_tasks_tenant_in_progress ON tasks (tenant_id, priority) WHERE status = 'in_progress';

Read Performance Benchmark

We'll query for the highest priority in_progress tasks for a given tenant. This is a classic dashboard query.

SELECT id FROM tasks WHERE tenant_id = $1 AND status = 'in_progress' ORDER BY priority DESC LIMIT 20;

Using pgbench to simulate 16 clients for 60 seconds:

Index Strategy	Transactions per Second (tps)	Avg. Latency (ms)	Index Size (`pg_relation_size`)
Standard Compound Index	2,150	7.44	450 MB
Partial Index	19,800	0.81	4 MB

The results are unambiguous. The partial index delivers a ~9.2x increase in throughput and a ~9.2x reduction in latency. The index itself is over 100x smaller.

Write Performance Benchmark

Optimization is about trade-offs. Let's measure the impact on INSERT performance. We'll INSERT rows with different statuses.

Index Strategy	`INSERT` tps (non-matching status)	`INSERT` tps (matching status)
Standard Compound Index	65,000	65,000
Partial Index	78,000 (+20%)	63,500 (-2.3%)

This reveals the nuance. When inserting a task that is not in_progress, the write is 20% faster with the partial index because the database doesn't have to update it. When inserting a task that is in_progress, the write is a negligible 2.3% slower because it has to update the new, smaller index (in addition to any other indexes on the table, which we assume exist in both scenarios). For a workload where only 0.5% of inserts match the partial index's predicate, the net effect on write performance is a significant win.

Conclusion: Indexing with Intent

Partial indexes force a shift in mindset from generic, catch-all indexing to deliberate, query-aware indexing. They are one of the most powerful yet underutilized features in PostgreSQL for building high-performance, scalable multi-tenant applications.

Before you reach for another compound index to solve a performance problem, ask yourself:

What is the exact query pattern I am optimizing for?

What is the data distribution of the columns in my WHERE clause? Is it highly skewed?

Can I define a narrow, stable predicate that covers a small, frequently accessed subset of my data?

By analyzing your application's hot spots with tools like pg_stat_statements and applying these surgical indexing patterns, you can achieve order-of-magnitude performance improvements, reduce storage costs, and build more resilient systems. It's the difference between a database that merely works and one that performs with precision at scale.