Hybrid Search in Postgres: Fusing pgvector and FTS with RRF

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Senior Engineer's Search Dilemma: Beyond Pure Vector Search

Your team has successfully integrated a vector database or an extension like pgvector into your stack. Your semantic search capability is impressive—users can find "summer clothes" when they search for "beach attire." But then the production bug reports start trickling in. A search for a specific product SKU like XJ-48-B1 yields nothing, even though it's in the database. A query for the brand "Acme©" returns conceptually similar but incorrect brands.

This is the critical limitation of pure dense vector search: it's a blunt instrument for a nuanced problem. It generalizes meaning at the cost of specificity. Conversely, traditional keyword search systems like PostgreSQL's Full-Text Search (FTS) or Elasticsearch are masters of precision but are lexically rigid and fail to grasp user intent.

The naive solution is to run two separate queries and clumsily interleave the results. This is not a production strategy. The real solution lies in a sophisticated fusion of both search paradigms. This article provides a comprehensive, production-ready guide to implementing a state-of-the-art hybrid search system entirely within PostgreSQL. We will leverage pgvector for dense semantic search, FTS for sparse keyword search, and—most critically—Reciprocal Rank Fusion (RRF) to intelligently merge the results into a single, highly relevant list.

We will not cover the basics of what embeddings are or how to install pgvector. This is for engineers who have hit the limits of a single-paradigm search and need to build a truly robust retrieval system.


Pillar 1: Dense Vectors with `pgvector` - Advanced Indexing Deep Dive

At the heart of our semantic capability is pgvector. While a simple ORDER BY embedding <=> :query_embedding works on small datasets, it's a sequential scan that is unviable in production. Effective hybrid search requires a high-performance dense search component, which means mastering pgvector's Approximate Nearest Neighbor (ANN) indexes.

The two primary index types are ivfflat and hnsw. Choosing the right one is a critical architectural decision.

The `ivfflat` Index: For Static Data and High Recall

The Inverted File with Flat Compression (ivfflat) index works by partitioning your vector space into lists. During a search, the query vector is compared to the centroids of these lists, and only the vectors within the most promising probes number of lists are actually searched.

* When to use it: Ideal for datasets that are indexed once and rarely updated. The index build is relatively fast, but adding new items requires a full re-index for optimal performance. It often provides slightly higher recall than hnsw if tuned correctly.

* Key Parameters:

* lists: The number of partitions. A good starting point is N / 1000 for up to 1M rows and sqrt(N) for larger datasets, where N is the number of rows.

* probes: The number of lists to search at query time. This is a runtime parameter, allowing you to trade speed for accuracy on a per-query basis. Higher probes means higher recall but slower queries.

Implementation Example:

Let's assume a table products with an embedding column of 384 dimensions (from a model like all-MiniLM-L6-v2).

sql
-- products table schema
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    embedding VECTOR(384)
);

-- Assuming 1,000,000 products, we choose lists = 1000
CREATE INDEX ON products USING ivfflat (embedding vector_cosine_ops) WITH (lists = 1000);

To query, we must dynamically set the probes parameter.

sql
-- Set a low number of probes for a fast, less accurate query
SET ivfflat.probes = 10;
SELECT id, name FROM products ORDER BY embedding <=> '[...query_vector...]' LIMIT 100;

-- Increase probes for higher recall, at the cost of latency
SET ivfflat.probes = 50;
SELECT id, name FROM products ORDER BY embedding <=> '[...query_vector...]' LIMIT 100;

Edge Case: The effectiveness of ivfflat degrades significantly as you add new data without re-indexing. The new vectors are not integrated into the existing partitions, leading to poor recall for recent entries. For write-heavy systems, ivfflat is often a poor choice.

The `hnsw` Index: For Dynamic Data and Low Latency

Hierarchical Navigable Small World (hnsw) is a graph-based index. It builds a multi-layered graph where long-distance connections exist on the top layers and shorter-distance connections on the bottom layers, allowing for efficient greedy traversal to find the nearest neighbors.

* When to use it: The gold standard for dynamic datasets where vectors are frequently added or updated. It has a higher build time and memory footprint than ivfflat but offers excellent query performance and supports concurrent indexing.

* Key Parameters:

* m: The maximum number of connections per node. Higher values (e.g., 16-48) lead to better recall and higher indexing time/memory usage.

* ef_construction: The size of the dynamic candidate list during index construction. A higher value (e.g., 64-256) results in a more accurate index, but slows down the build process.

* ef_search: The size of the dynamic candidate list during a search. This is the query-time parameter analogous to probes. Higher values increase recall and latency.

Implementation Example:

sql
-- HNSW is generally the preferred choice for production systems
CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128);

Querying with hnsw is similar to ivfflat but uses a different setting.

sql
-- A balanced speed/recall setting
SET hnsw.ef_search = 64;
SELECT id, name FROM products ORDER BY embedding <=> '[...query_vector...]' LIMIT 100;

Performance Consideration: hnsw indexes can consume significant memory. The formula is approximately (1.1 (M 8 layer_count + 4 M^2)) * num_elements bytes. For 1 million 384-dimensional vectors with M=16, this can be over 1.5 GB of RAM. Monitor your memory usage closely.


Pillar 2: Sparse Vectors with FTS - Beyond Basic Keyword Matching

PostgreSQL's Full-Text Search is a mature and powerful tool for keyword matching. We frame it as our "sparse vector" engine. A tsvector is effectively a high-dimensional sparse vector where each dimension corresponds to a lexeme (a normalized word), and the value is its frequency or position.

To build a production-grade FTS component, we must move beyond the default english configuration.

Custom FTS Configuration for Domain Specificity

Let's consider an e-commerce platform selling electronics. Default stemming might incorrectly conflate "USB-C" with "USB." We need a custom configuration.

sql
-- 1. Create a custom dictionary for synonyms. 'imac' and 'i-mac' should be the same.
CREATE TEXT SEARCH DICTIONARY electronics_syn (
    TEMPLATE = synonym,
    SYNONYMS = electronics_synonyms
);
-- You would populate a file named 'electronics_synonyms.syn' in your $SHAREDIR/tsearch_data/
-- Example content:
-- i-mac  imac
-- usb-c  usbc

-- 2. Create a custom configuration that uses our dictionary and the standard english stemmer.
CREATE TEXT SEARCH CONFIGURATION public.electronics (COPY = pg_catalog.english);

-- 3. Alter the configuration to use our synonym dictionary for specific word types.
ALTER TEXT SEARCH CONFIGURATION public.electronics
    ALTER MAPPING FOR asciiword, hword, hword_part, word
    WITH electronics_syn, english_stem;

Now we can build our schema to use this.

Schema and Automated Indexing:

We'll add a tsvector_col to our products table and use a trigger to keep it synchronized with our text fields (name and description).

sql
ALTER TABLE products ADD COLUMN tsvector_col TSVECTOR;

-- Create the GIN index for fast FTS queries
CREATE INDEX products_tsvector_idx ON products USING GIN(tsvector_col);

-- Create a function to update the tsvector column
CREATE OR REPLACE FUNCTION update_product_tsvector() RETURNS TRIGGER AS $$
BEGIN
    NEW.tsvector_col := 
        setweight(to_tsvector('public.electronics', COALESCE(NEW.name, '')), 'A') ||
        setweight(to_tsvector('public.electronics', COALESCE(NEW.description, '')), 'B');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Create the trigger
CREATE TRIGGER tsvector_update_trigger
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION update_product_tsvector();

Notice the use of setweight. This is crucial. We are weighting matches in the product name higher than matches in the description, which will be reflected in our ranking score.

Querying with Ranking:

The ts_rank_cd function calculates a relevance score based on word frequency and proximity. It's our primary tool for scoring sparse search results.

sql
-- The plainto_tsquery function is a good choice for user-provided search terms
-- as it handles parsing gracefully.
SELECT 
    id, 
    name, 
    ts_rank_cd(tsvector_col, plainto_tsquery('public.electronics', 'apple macbook pro')) as score
FROM products
WHERE tsvector_col @@ plainto_tsquery('public.electronics', 'apple macbook pro')
ORDER BY score DESC
LIMIT 100;

The Fusion Algorithm: Reciprocal Rank Fusion (RRF)

We now have two ordered lists of results: one from pgvector based on semantic similarity, and one from FTS based on keyword relevance. Their scores are on completely different scales. Vector distance is a measure of dissimilarity (lower is better), while ts_rank_cd is a relevance score (higher is better). A simple weighted average is mathematically unsound and will produce erratic results.

Enter Reciprocal Rank Fusion (RRF). RRF is a beautiful, simple, and powerful technique that completely ignores the raw scores. It only considers the rank of each document in the result lists.

The formula is:

RRF_Score(d) = Σ (1 / (k + rank_i(d)))

Where:

* d is a specific document.

* The sum is over all result lists i.

* rank_i(d) is the rank of document d in result list i (e.g., 1, 2, 3...).

* k is a constant, typically set to 60. Its purpose is to reduce the influence of documents with very high ranks (i.e., lower relevance), preventing a single outlier from dominating the final score.

Why RRF is effective:

  • Score Agnostic: It normalizes results from any search system without needing to understand their internal scoring logic.
  • Rewards Intersection: Documents that appear in the top results of both lists receive a significantly higher score, pushing them to the top of the fused list.
  • Handles Disjoint Sets: A document that appears in only one list still gets a score, ensuring we don't discard potentially relevant results.

  • Production Implementation: The Complete Hybrid Search Query

    Now we will assemble these components into a single, powerful SQL query. We'll use Common Table Expressions (CTEs) to keep the logic clean and maintainable.

    Let's assume we have a user query "fast laptop for programming" and have generated a query vector for it. We also have the raw query string.

    Here is the full, production-grade query:

    sql
    -- Full Hybrid Search Query with Reciprocal Rank Fusion (RRF)
    -- User Inputs:
    -- :query_embedding -> The 384-dimensional vector for 'fast laptop for programming'
    -- :query_text -> The raw string 'fast laptop for programming'
    -- :k_rrf -> The RRF constant, e.g., 60
    -- :limit -> The final number of results to return
    
    WITH dense_results AS (
        -- CTE 1: Perform the dense vector search using HNSW index.
        -- We fetch more results than needed (e.g., 2x the final limit) to provide a good pool for fusion.
        SELECT 
            id,
            -- The rank is crucial for RRF. We use row_number() to get a 1-based rank.
            ROW_NUMBER() OVER (ORDER BY embedding <=> :query_embedding) as rank
        FROM products
        ORDER BY embedding <=> :query_embedding
        LIMIT 200 -- Fetch more candidates for fusion
    ), 
    sparse_results AS (
        -- CTE 2: Perform the sparse FTS search.
        -- We use our custom 'electronics' configuration and rank results by relevance.
        SELECT 
            id,
            ROW_NUMBER() OVER (ORDER BY ts_rank_cd(tsvector_col, query) DESC) as rank
        FROM products, plainto_tsquery('public.electronics', :query_text) query
        WHERE tsvector_col @@ query
        ORDER BY ts_rank_cd(tsvector_col, query) DESC
        LIMIT 200 -- Fetch more candidates for fusion
    ),
    fused_results AS (
        -- CTE 3: Fuse the results using a FULL OUTER JOIN.
        -- This ensures we keep documents that appear in only one of the result sets.
        SELECT
            COALESCE(dense.id, sparse.id) as id,
            -- Calculate the RRF score for each result set. If a document is not in a set, its score contribution is 0.
            (1.0 / (:k_rrf + COALESCE(dense.rank, 0))) as dense_rrf_score,
            (1.0 / (:k_rrf + COALESCE(sparse.rank, 0))) as sparse_rrf_score
        FROM dense_results dense
        FULL OUTER JOIN sparse_results sparse ON dense.id = sparse.id
    )
    -- Final SELECT: Aggregate scores, order, and join back to get product details.
    SELECT
        p.id,
        p.name,
        p.description,
        -- The final RRF score is the sum of the individual scores.
        (fr.dense_rrf_score + fr.sparse_rrf_score) as rrf_score
    FROM fused_results fr
    JOIN products p ON fr.id = p.id
    ORDER BY rrf_score DESC
    LIMIT :limit; -- Apply the final limit

    Analysis of the Query Structure:

  • dense_results & sparse_results: These CTEs independently perform the two searches. Crucially, they use ROW_NUMBER() to generate the rank, which is the only piece of information RRF needs. We fetch more results than our final limit (LIMIT 200 for a final :limit of, say, 20) to ensure we have a rich set of candidates for the fusion step.
  • fused_results: This is the core of the RRF logic. The FULL OUTER JOIN is non-negotiable; it correctly handles documents present in one list but not the other. COALESCE is used to assign a rank of 0 if a document is missing from a list, but this is a simplification. A more robust approach would be to assign a rank of infinity, but given we are adding the reciprocal, a large rank number would have a negligible effect, and COALESCE with a 0 inside the denominator works well enough (1 / (k + 0) if present, 0 if not). A slightly more correct implementation might use a CASE statement, but this is highly readable.
  • Final Aggregation: We group by the document ID, sum the partial RRF scores to get the final score, and then join back to the products table to retrieve the full data. The final ORDER BY rrf_score DESC gives us our master, hybrid-relevance-ranked list.

  • Advanced Tuning and Edge Cases

    A production system requires more than just the base query. Here are critical considerations.

    Tuning the `k` Constant in RRF

    The k constant (defaulting to 60) prevents high-rank items from having an outsized impact. A document ranked #1 and a document ranked #2 have scores of 1/61 and 1/62 respectively—a small difference. A document ranked #100 has a score of 1/160. This flattens the curve.

    * Lower k (e.g., k=10): Makes the fusion more sensitive to top-ranked items. If you have high confidence in your individual searchers, a lower k can be beneficial. It will more aggressively promote items that are #1 in either list.

    * Higher k (e.g., k=100): Makes the ranking more democratic and less sensitive to the top positions. It gives more weight to the "long tail" of the results.

    Experiment with k on a validation set of queries to find the optimal value for your specific domain.

    Introducing Relative Weighting

    RRF's base formula treats all search systems as equals. But what if you want to give more weight to keyword matches for queries that look like SKUs, and more weight to semantic matches for conceptual queries? We can introduce a weighting factor into the RRF formula:

    Weighted_RRF_Score(d) = Σ (weight_i * (1 / (k + rank_i(d))))

    This can be implemented by adding a weight parameter to the query.

    sql
    -- Modified final SELECT for weighted RRF
    -- :dense_weight and :sparse_weight are new parameters (e.g., 1.0 and 1.5)
    SELECT
        p.id,
        p.name,
        -- Apply weights to the RRF scores
        (:dense_weight * fr.dense_rrf_score + :sparse_weight * fr.sparse_rrf_score) as rrf_score
    FROM fused_results fr
    JOIN products p ON fr.id = p.id
    ORDER BY rrf_score DESC
    LIMIT :limit;

    Your application layer can now implement logic to adjust these weights. For example, if query_text matches a regex for product SKUs, you could set :sparse_weight = 2.0 and :dense_weight = 0.5.

    Performance Analysis and Optimization

    Run EXPLAIN ANALYZE on your full query. You should see:

  • An Index Scan on your hnsw or ivfflat index for the dense_results CTE.
  • An Index Scan on your GIN index for the sparse_results CTE.
  • A Hash Full Outer Join to merge the results.
  • A final Sort operation.
  • The most expensive parts will be the initial index scans. Ensure your hnsw.ef_search or ivfflat.probes values are set appropriately for your latency requirements. The number of candidates you fetch in the initial CTEs (LIMIT 200 in our example) is a direct trade-off: more candidates can lead to better fusion results but will increase the cost of the join and sort. There is no magic number; this must be tuned based on your data distribution and performance targets.

    For extremely high-throughput systems, consider creating a PL/pgSQL function to encapsulate this logic, which can sometimes give the query planner better optimization opportunities.

    Conclusion: An Integrated Search Powerhouse

    By moving beyond the limitations of single-paradigm search, we have constructed a sophisticated, state-of-the-art hybrid search system. This approach provides the best of both worlds: the semantic understanding of dense vectors and the keyword precision of sparse vectors. The Reciprocal Rank Fusion algorithm provides an elegant, mathematically sound method for merging these disparate signals into a single, superior relevance score.

    The most compelling aspect of this architecture is that it can be implemented entirely within a standard PostgreSQL database, augmented with the pgvector extension. You can achieve world-class search performance and relevance without the operational overhead of managing and synchronizing a separate search cluster like Elasticsearch or a dedicated vector database. This integrated approach simplifies your stack, reduces latency, and ensures transactional consistency between your primary data and your search indexes, a feat that is notoriously difficult with external search systems.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles