Optimizing PGVector HNSW for High-Cardinality Metadata Filtering
The Performance Cliff of Metadata Filtering in Vector Search
As engineers building AI-powered applications, we've rapidly adopted vector databases and extensions like pgvector to power semantic search, RAG pipelines, and recommendation engines. The core operation—finding the k nearest neighbors (KNN) to a query vector—is remarkably efficient with modern index structures like HNSW (Hierarchical Navigable Small World). However, production systems are rarely this simple. Our queries almost always involve filtering by metadata: finding similar products within a specific category, retrieving relevant documents for a single tenant, or recommending articles published after a certain date.
This is where many high-performance vector search implementations hit a performance cliff. The naive approach of applying a standard WHERE clause to a KNN query in PostgreSQL often leads to disastrous query plans, forcing a choice between an efficient vector index scan and an efficient metadata index scan, but rarely both. This problem is particularly acute with high-cardinality metadata, such as a tenant_id in a multi-tenant SaaS application or a user_id in a personalization system.
This article bypasses introductory concepts. We assume you understand what pgvector is and have a running HNSW index. We will dissect the performance degradation of post-filtering and explore two production-grade architectural patterns to achieve efficient, scalable pre-filtering: declarative partitioning for multi-tenancy and a composite B-Tree strategy for other high-cardinality scenarios.
The Anatomy of a Slow Query: Post-Filtering Examined
Let's model a common scenario: an e-commerce platform with millions of product listings. Each product has a vector embedding generated from its image and description, and it belongs to a specific tenant_id representing the seller.
Our table structure might look like this:
-- Ensure the pgvector extension is installed
CREATE EXTENSION IF NOT EXISTS vector;
-- A table to store product information and their embeddings
CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    tenant_id INT NOT NULL,
    category_id INT NOT NULL,
    product_name VARCHAR(255),
    embedding VECTOR(768) -- Assuming a 768-dimensional embedding
);
-- Populate with sample data (in a real scenario, this would be millions of rows)
-- For demonstration, let's assume we have 10 million products across 10,000 tenants.
-- INSERT INTO products (tenant_id, category_id, product_name, embedding) ...To accelerate vector search, we create an HNSW index on the embedding column. We also need to filter by tenant_id, so we'll add a standard B-Tree index on that.
-- Vector index for KNN search
CREATE INDEX ON products USING hnsw (embedding vector_l2_ops);
-- Standard index for metadata filtering
CREATE INDEX ON products (tenant_id);Now, a user from tenant 123 searches for products similar to a query vector. The intuitive SQL query looks like this:
-- The "intuitive" but problematic query
EXPLAIN ANALYZE
SELECT id, product_name
FROM products
WHERE tenant_id = 123
ORDER BY embedding <=> '[...query_vector...]'::vector
LIMIT 10;The query planner is now in a difficult position. It has two primary paths:
tenant_id = 123. This is post-filtering. The problem? If tenant 123's products are not among the absolute top nearest neighbors in the entire dataset, this query will return few or no results. To get 10 results for tenant 123, we might need to fetch LIMIT 1000 or more from the vector search, then filter, which is inefficient and still not guaranteed to work.tenant_id B-Tree index: Scan the tenant_id index to find all products belonging to tenant 123. If this tenant has 1,000 products, the database will then perform a brute-force distance calculation between the query vector and all 1,000 of that tenant's product embeddings, sort them, and return the top 10. This avoids the HNSW index entirely, leading to a Seq Scan or Bitmap Heap Scan on the subset, which is computationally expensive and does not scale.Let's examine a likely EXPLAIN ANALYZE output for the second, more common scenario on a large table:
Limit  (cost=1350.50..1350.52 rows=10 width=37) (actual time=153.461..153.463 rows=10 loops=1)
  ->  Sort  (cost=1350.50..1353.00 rows=1000 width=37) (actual time=153.459..153.460 rows=10 loops=1)
        Sort Key: ((embedding <=> '[...]'::vector))
        Sort Method: top-N heapsort  Memory: 29kB
        ->  Bitmap Heap Scan on products  (cost=25.04..1300.00 rows=1000 width=37) (actual time=0.460..152.123 rows=1000 loops=1)
              Recheck Cond: (tenant_id = 123)
              Heap Blocks: exact=950
              ->  Bitmap Index Scan on products_tenant_id_idx  (cost=0.00..24.79 rows=1000 width=0) (actual time=0.045..0.045 rows=1000 loops=1)
                    Index Cond: (tenant_id = 123)
Planning Time: 0.215 ms
Execution Time: 153.524 msHere, PostgreSQL correctly chose the products_tenant_id_idx to quickly find the 1,000 products for our tenant. But then, it performed a brute-force sort (top-N heapsort) on those 1,000 vectors. This took ~153ms. While this might be acceptable for a tenant with 1,000 products, it will scale linearly. For a tenant with 100,000 products, this query will become unacceptably slow, likely taking seconds.
This is the core problem: HNSW indexes in pgvector (as of version 0.5.x) do not natively support pre-filtering. The index scan operates on the entire graph of vectors, unaware of the WHERE clause. We need an architectural solution.
Pattern 1: Declarative Partitioning for True Pre-Filtering
For multi-tenancy, which is one of the most common high-cardinality filtering requirements, PostgreSQL's native partitioning is a powerful and elegant solution. The strategy is to physically separate data for each tenant into its own table-like structure (a partition), allowing us to build an HNSW index on each partition individually.
When we query with a WHERE tenant_id = ... clause, the query planner performs partition pruning, meaning it doesn't even consider the other partitions. The vector search is then executed on the much smaller, tenant-specific HNSW index, achieving true pre-filtering.
Implementation Steps
First, we redefine our products table as a partitioned table.
-- Step 1: Create the partitioned table (the "template")
CREATE TABLE products_partitioned (
    id BIGSERIAL,
    tenant_id INT NOT NULL,
    category_id INT NOT NULL,
    product_name VARCHAR(255),
    embedding VECTOR(768)
) PARTITION BY LIST (tenant_id);
-- Make sure the primary key includes the partition key
ALTER TABLE products_partitioned ADD PRIMARY KEY (id, tenant_id);Notice two critical details:
PARTITION BY LIST (tenant_id). This tells PostgreSQL that we will create explicit partitions for specific tenant_id values.tenant_id). This is a requirement for partition pruning to work effectively with primary keys.Next, we create partitions for our tenants. This can be done manually or, more realistically, automated via a provisioning script or trigger when a new tenant signs up.
-- Step 2: Create individual partitions for tenants
CREATE TABLE products_tenant_123 PARTITION OF products_partitioned
    FOR VALUES IN (123);
CREATE TABLE products_tenant_456 PARTITION OF products_partitioned
    FOR VALUES IN (456);
-- ... and so on for all other tenants.Now for the crucial step: creating the indexes. We create the HNSW index on the main partitioned table. PostgreSQL will automatically propagate the index creation to all existing and future partitions.
-- Step 3: Create indexes on the main partitioned table
-- This will create an HNSW index on each partition (e.g., products_tenant_123, products_tenant_456)
CREATE INDEX ON products_partitioned USING hnsw (embedding vector_l2_ops);
-- A B-Tree index on the partition key is also created automatically, but explicit creation is fine.
-- CREATE INDEX ON products_partitioned (tenant_id);Querying and Performance Analysis
With this structure in place, our original query now behaves dramatically differently.
EXPLAIN ANALYZE
SELECT id, product_name
FROM products_partitioned
WHERE tenant_id = 123 -- This clause now enables partition pruning
ORDER BY embedding <=> '[...query_vector...]'::vector
LIMIT 10;The query plan will be a revelation:
Limit  (cost=5.30..5.32 rows=10 width=37) (actual time=0.981..0.983 rows=10 loops=1)
  ->  Index Scan using products_tenant_123_embedding_idx on products_tenant_123  (cost=5.30..15.00 rows=1000 width=37) (actual time=0.979..0.980 rows=10 loops=1)
        Order By: (embedding <=> '[...]'::vector)
        Limit: 10
Planning Time: 0.850 ms
Execution Time: 1.021 msLet's break down this beautiful result:
*   Targeted Scan: The planner immediately identified that it only needs to scan the products_tenant_123 partition. All other tenant data is completely ignored.
*   Index Scan on HNSW: It's using products_tenant_123_embedding_idx, the HNSW index specific to that partition.
* Blazing Speed: The execution time has dropped from ~153ms to just ~1ms. This performance will remain consistent even if we scale to billions of products across tens of thousands of tenants, because each query only ever operates on a single tenant's isolated data and index.
Edge Cases and Production Considerations
*   Partition Management: Automating partition creation and removal is essential. Use a trigger on your tenants table or an application-level process to create a new partition when a tenant is created.
*   Schema Migrations: Tools like pg_partman can help, but schema changes (e.g., adding a column) must be applied to the parent table, which then propagates. This can involve locking and require careful deployment strategies.
Cross-Tenant Queries: If you ever need to perform a vector search across all* tenants, this model is inefficient, as it will have to perform separate scans on every partition. This architecture is optimized for isolated data access.
*   High Number of Partitions: PostgreSQL handles thousands of partitions well, but performance can degrade into the tens of thousands. If you have millions of tenants, consider a hybrid approach or a different partitioning strategy (e.g., PARTITION BY HASH(tenant_id) to group tenants into a fixed number of partitions).
Pattern 2: Composite B-Tree Indexes for High-Selectivity Filtering
Partitioning is perfect for tenant_id, but what if you need to filter on a different high-cardinality column where partitioning is impractical? For example, filtering by category_id where there are 50,000 categories, or by author_id where there are millions of authors. Creating a partition for each is not feasible.
In this scenario, we can leverage a clever two-step query pattern that combines a standard B-Tree index with a targeted, brute-force KNN search on a small, pre-filtered subset of data.
This pattern is effective when the filter is highly selective—that is, the WHERE clause returns a small percentage of the total dataset (e.g., a few thousand rows out of millions).
Implementation Steps
Let's revert to our original, non-partitioned products table. This time, our goal is to efficiently query for products in category_id = 5555.
-- Assuming the non-partitioned products table from the first section
-- with 10M rows and an HNSW index on `embedding`.
-- Step 1: Create a highly effective composite B-Tree index
-- The order of columns matters. Place the column used for filtering first.
CREATE INDEX products_category_id_idx ON products (category_id);The strategy is to explicitly tell PostgreSQL how to execute the query:
products_category_id_idx to very quickly identify the primary keys of all products that match our filter.We can achieve this using a Common Table Expression (CTE) or a subquery.
-- The Two-Step Query Pattern
EXPLAIN ANALYZE
WITH filtered_products AS (
    -- Step 1: Fast B-Tree index scan to get IDs
    SELECT id
    FROM products
    WHERE category_id = 5555
)
-- Step 2: Join back and perform KNN on the small subset
SELECT p.id, p.product_name
FROM products p
JOIN filtered_products fp ON p.id = fp.id
ORDER BY p.embedding <=> '[...query_vector...]'::vector
LIMIT 10;Performance Analysis and the "Tipping Point"
Let's analyze the query plan for this approach, assuming category_id = 5555 contains 2,000 products.
Limit  (cost=185.33..185.35 rows=10 width=29) (actual time=25.811..25.813 rows=10 loops=1)
  ->  Sort  (cost=185.33..190.33 rows=2000 width=29) (actual time=25.809..25.810 rows=10 loops=1)
        Sort Key: ((p.embedding <=> '[...]'::vector))
        Sort Method: top-N heapsort  Memory: 29kB
        ->  Nested Loop  (cost=0.87..115.50 rows=2000 width=29) (actual time=0.085..24.321 rows=2000 loops=1)
              ->  CTE Scan on filtered_products fp  (cost=0.00..40.00 rows=2000 width=8) (actual time=0.035..1.250 rows=2000 loops=1)
              ->  Index Scan using products_pkey on products p  (cost=0.87..0.95 rows=1 width=37) (actual time=0.005..0.005 rows=1 loops=1)
                    Index Cond: (id = fp.id)
  CTE filtered_products
    ->  Bitmap Heap Scan on products  (cost=50.08..55.00 rows=2000 width=8) (actual time=0.032..1.101 rows=2000 loops=1)
          Recheck Cond: (category_id = 5555)
          ->  Bitmap Index Scan on products_category_id_idx  (cost=0.00..49.58 rows=2000 width=0) (actual time=0.021..0.021 rows=2000 loops=1)
                Index Cond: (category_id = 5555)
Planning Time: 0.345 ms
Execution Time: 25.950 msLet's dissect this plan:
filtered_products CTE is executed first. It uses the products_category_id_idx to find the 2,000 matching product IDs in just over 1ms. This is extremely efficient.products table using its primary key index, which is very fast.The total execution time of ~26ms is a massive improvement over the ~153ms from the original naive query, and it avoids the recall problems of post-filtering entirely.
However, this pattern has a tipping point. As the number of rows returned by the filter (category_id = 5555) increases, the cost of the brute-force sort will eventually exceed the cost of a full HNSW index scan. Where is this point? It depends heavily on your hardware, vector dimensionality, and HNSW tuning parameters, but a general rule of thumb is that this pattern is effective for subsets up to a few tens of thousands of rows. Beyond that, a full HNSW scan might be faster, even with its post-filtering inefficiency.
It is your responsibility as a senior engineer to benchmark this for your specific workload to determine the threshold at which you might switch query patterns.
Section 3: Advanced HNSW Tuning for Filtered Subsets
Once you have a strategy to reduce the search space (either via partitioning or CTEs), you can further optimize performance by tuning the HNSW index parameters. These parameters control the trade-off between search speed, recall (accuracy), and index build time.
m: The maximum number of connections per node in the graph. (Default: 16)    -   Impact: Higher m creates a denser, more accurate graph, leading to better recall. However, it increases index size and build time.
    -   Recommendation: For production systems where recall is critical, consider increasing m to 24 or 32 during index creation. This is a one-time cost for better query accuracy.
ef_construction: The size of the dynamic candidate list during the index build. (Default: 64)    -   Impact: A larger ef_construction allows the build algorithm to explore more potential neighbors for each node, resulting in a higher-quality graph and better recall. This significantly slows down index creation.
    -   Recommendation: If you can afford a longer indexing time (e.g., overnight batch jobs), increasing ef_construction to 128 or 256 can yield noticeable improvements in search quality.
ef_search: The size of the dynamic candidate list at query time. (Default: 40)    -   Impact: This is the most important run-time tuning parameter. It directly controls the speed vs. recall trade-off for every search. A higher ef_search explores more of the graph, increasing the probability of finding the true nearest neighbors (higher recall) at the cost of higher latency.
    -   Recommendation: This parameter can be set per-transaction, allowing for dynamic adjustment. You can offer a "fast" search with a low ef_search (e.g., 50) and a "high accuracy" search with a higher value (e.g., 200).
-- Example of setting ef_search for a single query
BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id, product_name
FROM products_partitioned
WHERE tenant_id = 123
ORDER BY embedding <=> '[...query_vector...]'::vector
LIMIT 10;
COMMIT;The optimal ef_search value must be determined empirically. Create a ground truth dataset, and then run queries with increasing ef_search values, plotting the recall@10 against the p95 latency. This allows you to choose a value that meets your application's specific SLOs for both accuracy and speed.
Conclusion: From Naive Queries to Scalable Architecture
Efficiently filtering metadata is a non-trivial challenge that separates proof-of-concept vector search projects from scalable, production-ready AI systems. Relying on the database planner with naive queries will inevitably lead to performance bottlenecks as your data grows.
We've explored two robust, production-proven patterns to overcome this:
Mastering these patterns requires a deep understanding of both vector index mechanics and PostgreSQL's query execution planner. The optimal solution is not one-size-fits-all but depends on your data distribution, query patterns, and cardinality. By moving beyond simple KNN queries and architecting for your filtering needs, you can build semantic search applications on pgvector that are not only powerful but also performant and scalable.