Hybrid Search in Postgres: Fusing pgvector & FTS with RRF

19 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Problem with Unfused Search in Production

As engineers building AI-native applications, we're often faced with a critical choice for search functionality: traditional keyword-based Full-Text Search (FTS) or modern semantic vector search. FTS, powered by tsvector and GIN indexes in PostgreSQL, is incredibly fast and precise for matching specific terms, acronyms, and product codes. Vector search, enabled by pgvector and HNSW/IVFFlat indexes, excels at understanding user intent, synonyms, and conceptual relationships.

The obvious conclusion is to use both. This is hybrid search. However, the immediate challenge that separates a prototype from a production system is how to combine the results from these two fundamentally different search paradigms. A naive approach might involve fetching top-K results from each, normalizing their scores to a common scale (e.g., 0-1), and applying a weighted average.

This approach is fraught with peril in a real-world system:

  • Incompatible Score Distributions: FTS scores (like ts_rank_cd) and vector similarity scores (like cosine similarity or L2 distance) are not directly comparable. Their distributions are different, and their magnitudes can vary wildly depending on the query and corpus. A ts_rank of 0.1 might be a great match, while a cosine similarity of 0.8 might be a poor one.
  • Brittle Weight Tuning: The weights in a formula like 0.7 fts_score + 0.3 vector_score are magic numbers. They are highly sensitive to the dataset and query patterns. A set of weights that works well for one type of query may perform terribly for another. This leads to constant, manual re-tuning, which is not scalable.
  • The Normalization Trap: Normalizing scores requires knowledge of the global min/max scores, which is computationally expensive to get for every query. Normalizing based on the scores within a single result set is unstable and can lead to inconsistent ranking.
  • We need a more robust, principled method for fusing these result sets. This is where Reciprocal Rank Fusion (RRF) comes in. RRF is a rank-based fusion technique that ignores the raw scores entirely, making it immune to the problems above. It combines result lists based solely on the position of a document in each list, providing a simple yet powerful way to merge lexical and semantic search results.

    In this article, we'll build a production-grade hybrid search system within PostgreSQL using pgvector, FTS, and RRF. We will:

    • Set up a realistic schema with appropriate FTS and HNSW indexes.
    • Implement a complex RRF fusion query using Common Table Expressions (CTEs).
    • Analyze the query plan to understand its performance characteristics.
    • Encapsulate the logic into a reusable PL/pgSQL function for clean application-layer integration.
    • Discuss advanced considerations like tuning and handling edge cases.

    1. Schema and Indexing Strategy

    Let's model a multi-tenant system for storing and searching technical documents. Each document has text content, a pre-computed embedding vector, and belongs to a specific tenant.

    Table Definition

    First, ensure the vector extension is enabled:

    sql
    CREATE EXTENSION IF NOT EXISTS vector;

    Now, let's define our documents table.

    sql
    CREATE TABLE documents (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        tenant_id UUID NOT NULL,
        content TEXT NOT NULL,
        embedding VECTOR(768), -- Assuming a 768-dimensional model like sentence-transformers/all-mpnet-base-v2
        content_tsv TSVECTOR,
        created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
    );
    
    -- For multi-tenancy performance
    CREATE INDEX idx_documents_tenant_id ON documents (tenant_id);

    Automated `tsvector` Generation

    We don't want our application layer to be responsible for creating the tsvector. We can automate this with a trigger, ensuring data consistency and simplifying application logic.

    sql
    CREATE OR REPLACE FUNCTION update_content_tsv()
    RETURNS TRIGGER AS $$
    BEGIN
        NEW.content_tsv := to_tsvector('english', NEW.content);
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    CREATE TRIGGER trg_update_content_tsv
    BEFORE INSERT OR UPDATE ON documents
    FOR EACH ROW
    EXECUTE FUNCTION update_content_tsv();

    The Dual Indexing Approach

    For our hybrid search to be performant, we need two distinct types of indexes: one for FTS and one for vector search.

    1. GIN Index for Full-Text Search

    A GIN (Generalized Inverted Index) is the standard choice for tsvector columns. It's highly efficient for finding rows that contain specific lexemes.

    sql
    -- Index for FTS queries
    CREATE INDEX idx_documents_content_tsv ON documents USING GIN (content_tsv);

    2. HNSW Index for Vector Search

    For Approximate Nearest Neighbor (ANN) search, HNSW (Hierarchical Navigable Small World) is a state-of-the-art algorithm that provides an excellent balance of speed and accuracy. pgvector supports HNSW indexes as of version 0.5.0.

    sql
    -- Index for ANN vector search. Adjust m and ef_construction based on your data/needs.
    -- m: max number of connections per layer (higher = better recall, slower build).
    -- ef_construction: size of the dynamic candidate list during build (higher = better recall, slower build).
    CREATE INDEX idx_documents_embedding ON documents USING HNSW (embedding vector_l2_ops)
    WITH (m = 16, ef_construction = 64);

    A Note on Index Choice: While IVFFlat is another option, HNSW generally offers better performance for many datasets, especially when you need high recall without scanning a large number of probes. The trade-off is slightly higher build times and memory usage.


    2. Implementing Reciprocal Rank Fusion (RRF) in SQL

    Now for the core of our system. RRF works by calculating a new score for each document based on its rank in the result lists. The formula is:

    RRF_Score(d) = Σ (1 / (k + rank_i(d)))

    Where:

  • d is a document.
  • rank_i(d) is the rank of document d in result set i.
  • k is a constant (commonly set to 60) that dampens the influence of high-ranking results, preventing a single #1 result from dominating the final list.
  • Let's build this step-by-step using CTEs.

    Step 2.1: Sample Data

    First, let's insert some sample data. In a real application, the embeddings would be generated by a model. Here, we'll use random vectors for demonstration.

    sql
    -- Insert some sample data for a specific tenant
    DO $$
    DECLARE
        v_tenant_id UUID := 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11';
    BEGIN
        INSERT INTO documents (tenant_id, content, embedding) VALUES
        (v_tenant_id, 'PostgreSQL offers robust full-text search capabilities.', random_vector(768)),
        (v_tenant_id, 'Reciprocal Rank Fusion is a powerful method for merging search results.', random_vector(768)),
        (v_tenant_id, 'The pgvector extension enables efficient similarity search in Postgres.', random_vector(768)),
        (v_tenant_id, 'Optimizing HNSW index parameters like m and ef_construction is crucial for performance.', random_vector(768)),
        (v_tenant_id, 'Combining lexical and semantic search provides superior relevance.', random_vector(768)),
        (v_tenant_id, 'A deep dive into advanced SQL patterns for data fusion.', random_vector(768));
    END $$;

    (Note: random_vector is a helper function you'd need to create, e.g., CREATE OR REPLACE FUNCTION random_vector(dim integer) RETURNS vector AS $$ SELECT array_to_string(ARRAY(SELECT random() FROM generate_series(1, dim)), ',')::vector; $$ LANGUAGE sql;)

    Step 2.2: The RRF Query

    Imagine a user searches for "advanced postgres search fusion". This query has both specific keywords ("postgres", "search") and a semantic concept ("advanced fusion").

    Here is the complete SQL query to perform RRF.

    sql
    -- Define search parameters
    WITH params AS (
        SELECT
            'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11'::UUID AS target_tenant_id,
            websearch_to_tsquery('english', 'advanced postgres search fusion') AS fts_query,
            -- In a real app, this vector comes from your embedding model
            random_vector(768) AS query_vector,
            60 AS rrf_k, -- RRF constant
            10 AS result_limit
    ),
    -- 1. Full-Text Search Results, ranked
    fts_results AS (
        SELECT
            id,
            ROW_NUMBER() OVER (ORDER BY ts_rank_cd(content_tsv, p.fts_query) DESC) as rank
        FROM
            documents,
            params p
        WHERE
            tenant_id = p.target_tenant_id
            AND content_tsv @@ p.fts_query
        ORDER BY
            ts_rank_cd(content_tsv, p.fts_query) DESC
        LIMIT 100 -- Fetch more results than needed to allow for fusion
    ),
    -- 2. Vector Search Results, ranked
    vector_results AS (
        SELECT
            id,
            ROW_NUMBER() OVER (ORDER BY embedding <-> p.query_vector) as rank
        FROM
            documents,
            params p
        WHERE
            tenant_id = p.target_tenant_id
        ORDER BY
            embedding <-> p.query_vector
        LIMIT 100 -- Fetch more results than needed to allow for fusion
    ),
    -- 3. Union the results and calculate RRF score
    ranked_union AS (
        SELECT id, rank FROM fts_results
        UNION ALL
        SELECT id, rank FROM vector_results
    )
    -- 4. Final Aggregation and Ranking
    SELECT
        ru.id,
        SUM(1.0 / (p.rrf_k + ru.rank)) AS rrf_score,
        d.content -- Fetch content for display
    FROM
        ranked_union ru
    JOIN
        documents d ON ru.id = d.id,
        params p
    GROUP BY
        ru.id, d.content, p.result_limit
    ORDER BY
        rrf_score DESC
    LIMIT 10; -- Apply final limit
    

    Breakdown of the Query:

  • params CTE: This is a crucial pattern for readability and maintainability. All input parameters are defined in one place. Your application code only needs to substitute values here.
  • fts_results CTE: Performs the standard FTS query. Critically, we use ROW_NUMBER() to assign a rank to each result based on the FTS score. We do not use the score itself in the final calculation.
  • vector_results CTE: Performs the vector similarity search. We use the L2 distance operator (<->) for ordering. Again, ROW_NUMBER() is used to assign a rank, and the actual distance value is discarded.
  • ranked_union CTE: This is where the magic starts. We UNION ALL the two ranked lists. A document that appears in both searches will now appear as two rows in this CTE, each with its respective rank.
  • Final Aggregation: We group by the document id. The SUM(1.0 / (p.rrf_k + ru.rank)) calculation is the direct implementation of the RRF formula. Documents that ranked highly in both lists will accumulate a much higher rrf_score. Finally, we order by this new score to get our fused, relevance-ranked list.
  • This approach elegantly solves our initial problems. There's no score normalization, and the only tunable parameter is k, which is far less sensitive than arbitrary weights.


    3. Performance Analysis with `EXPLAIN ANALYZE`

    A complex query is only useful if it's fast. Let's analyze its execution plan.

    sql
    EXPLAIN ANALYZE
    -- ... (paste the entire query from above)

    You would see an output similar to this (simplified for clarity):

    text
    Limit  (cost=... actual time=...) 
      ->  Sort  (cost=... actual time=...) 
            Sort Key: (sum((1.0 / (60 + (row_number() OVER (...)))))) DESC
            ->  HashAggregate  (cost=... actual time=...)
                  Group Key: ru.id, d.content
                  ->  Hash Join  (cost=... actual time=...)
                        Hash Cond: (ru.id = d.id)
                        ->  Append  (cost=... actual time=...)
                              ->  Subquery Scan on fts_results  (cost=... actual time=...)
                                    ->  Limit  (cost=... actual time=...)
                                          ->  WindowAgg  (cost=... actual time=...)
                                                ->  Sort  (cost=... actual time=...)
                                                      ->  Bitmap Heap Scan on documents  (cost=... actual time=...)
                                                            Recheck Cond: (content_tsv @@ ...)
                                                            ->  Bitmap Index Scan on idx_documents_content_tsv  (cost=... actual time=...)
                              ->  Subquery Scan on vector_results  (cost=... actual time=...)
                                    ->  Limit  (cost=... actual time=...)
                                          ->  WindowAgg  (cost=... actual time=...)
                                                ->  Sort  (cost=... actual time=...)
                                                      ->  Index Scan using idx_documents_embedding on documents  (cost=... actual time=...)
                                                            Order By: (embedding <-> ...)
                        ->  Hash  (cost=... actual time=...)
                              ->  Seq Scan on documents d  (cost=... actual time=...)

    Key Takeaways from the Plan:

    Dual Index Usage: The planner correctly identifies and uses both* of our indexes: Bitmap Index Scan on idx_documents_content_tsv for the FTS part and Index Scan using idx_documents_embedding for the vector search part. This is the most critical confirmation that our setup is working efficiently.

    * Parallel Execution: The two CTEs (fts_results and vector_results) are executed independently, potentially in parallel depending on your PostgreSQL configuration. The Append node combines their results.

    * Window Functions and Sorts: The ROW_NUMBER() function requires a WindowAgg operation, which in turn requires sorting the results from each search. This is an unavoidable cost but is performed on the limited result sets (100 rows in our example), not the entire table.

    * Final Join and Aggregation: The final Hash Join and HashAggregate are efficient operations for combining the ranked lists and calculating the final RRF score.

    The performance of this query is primarily dependent on the performance of the two underlying index scans. As long as your FTS and HNSW indexes are well-maintained and selective, this query will be highly performant, even on millions of rows.


    4. Encapsulating Logic in a PL/pgSQL Function

    Exposing such a complex query to your application layer is a bad practice. It's hard to maintain, test, and secure. The correct approach is to encapsulate it in a PL/pgSQL function. This creates a clean, reusable interface.

    sql
    CREATE TYPE hybrid_search_result AS (
        id UUID,
        content TEXT,
        rrf_score NUMERIC
    );
    
    CREATE OR REPLACE FUNCTION hybrid_search(
        p_tenant_id UUID,
        p_search_term TEXT,
        p_query_vector VECTOR(768),
        p_rrf_k INT DEFAULT 60,
        p_limit INT DEFAULT 10,
        p_fts_weight NUMERIC DEFAULT 1.0, -- Optional: Advanced weighting
        p_vector_weight NUMERIC DEFAULT 1.0 -- Optional: Advanced weighting
    )
    RETURNS SETOF hybrid_search_result AS $$
    BEGIN
        RETURN QUERY
        WITH fts_results AS (
            SELECT
                id,
                ROW_NUMBER() OVER (ORDER BY ts_rank_cd(content_tsv, websearch_to_tsquery('english', p_search_term)) DESC) as rank
            FROM documents
            WHERE tenant_id = p_tenant_id AND content_tsv @@ websearch_to_tsquery('english', p_search_term)
            LIMIT 100
        ),
        vector_results AS (
            SELECT
                id,
                ROW_NUMBER() OVER (ORDER BY embedding <-> p_query_vector) as rank
            FROM documents
            WHERE tenant_id = p_tenant_id
            ORDER BY embedding <-> p_query_vector
            LIMIT 100
        ),
        ranked_union AS (
            SELECT id, rank, p_fts_weight AS weight FROM fts_results
            UNION ALL
            SELECT id, rank, p_vector_weight AS weight FROM vector_results
        )
        SELECT
            ru.id,
            d.content,
            SUM(ru.weight * (1.0 / (p_rrf_k + ru.rank)))::NUMERIC AS rrf_score
        FROM ranked_union ru
        JOIN documents d ON ru.id = d.id
        GROUP BY ru.id, d.content
        ORDER BY rrf_score DESC
        LIMIT p_limit;
    END;
    $$ LANGUAGE plpgsql;
    

    Enhancements in the Function:

    * Custom Return Type: We define a hybrid_search_result type for a clean, strongly-typed function signature.

    * Parameterization: All inputs are now function arguments with sensible defaults.

    Advanced Weighting (Optional): While RRF's main benefit is avoiding score-based weights, you can introduce a rank-based* weighting scheme. Here, we've added p_fts_weight and p_vector_weight. This allows you to give more importance to one search modality if needed (e.g., if a user's query contains quotes, you might increase the FTS weight). This is a more stable form of weighting than operating on raw scores.

    Now, your application's code becomes beautifully simple:

    sql
    -- From your application, you just call this:
    SELECT * FROM hybrid_search(
        p_tenant_id    := 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11',
        p_search_term  := 'advanced postgres search fusion',
        p_query_vector := '...' -- Your 768-dim vector
    );

    5. Advanced Considerations and Edge Cases

    Tuning k in RRF: The k parameter controls how much a few high-ranked documents can influence the final score. A smaller k gives more weight to the top results. A larger k considers a wider range of ranks more equally. The default of k=60 is a widely used heuristic from search engine research and is a good starting point. You can tune it based on your specific relevance testing, but it's far less volatile than tuning score weights.

    Edge Case: No Results from One Searcher: What if the FTS query returns zero results? Our UNION ALL approach handles this gracefully. The fts_results CTE will be empty, and the ranked_union will only contain results from vector_results. The RRF calculation proceeds correctly with only one set of ranks. The same applies if vector search finds nothing.

    Handling Pagination: Implementing cursor-based or offset-based pagination on this function requires care. The final ORDER BY rrf_score DESC is stable only if the scores are unique. To ensure stable ordering for pagination, you should add a tie-breaker column, like the document's id:

    sql
    -- Inside the function's final query
    ORDER BY rrf_score DESC, ru.id ASC

    Pre-filtering vs. Post-filtering: In our query, the tenant_id filter is applied before each search. This is crucial for performance as it allows PostgreSQL to use a multi-column index (e.g., on (tenant_id, content_tsv)) if available and dramatically reduces the search space for both FTS and vector queries. Always apply filters before the expensive search and ranking operations.

    When to Eject from Postgres? This all-in-Postgres approach is incredibly powerful for its simplicity, data consistency, and transactional integrity. It can scale to tens of millions of documents. However, if your scale reaches hundreds of millions or billions of documents, or if you require complex, multi-stage ranking pipelines and faceting that are the bread and butter of dedicated search engines, it might be time to consider a specialized system like Elasticsearch (with vector plugins) or Vespa. But for a vast number of applications, keeping search within the primary database is a massive architectural simplification and a winning strategy.

    Conclusion

    By moving from naive score-based merging to Reciprocal Rank Fusion, we have engineered a robust, production-ready hybrid search system entirely within PostgreSQL. This pattern leverages the strengths of both tsvector and pgvector while avoiding the common pitfalls of score normalization and brittle weighting. By using CTEs for clarity, analyzing the execution plan for performance, and encapsulating the final logic in a PL/pgSQL function, we have an advanced search component that is performant, maintainable, and delivers highly relevant results by truly fusing the lexical and semantic worlds.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles