Hybrid Search in Postgres: Fusing pgvector and BM25 with RRF

17 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Dichotomy of Modern Search: Semantic vs. Lexical

In production search systems, the central conflict is between understanding user intent and matching user terms. Senior engineers recognize this isn't an either/or problem. A system that relies solely on semantic search via vector embeddings excels at conceptual queries like "summer clothes for a beach vacation" but will often fail to retrieve a specific product when a user types in a precise SKU like XJ-48-B2. Conversely, a classic lexical search system, like Elasticsearch's BM25 or PostgreSQL's full-text search (FTS), will nail the SKU query but fall flat on the conceptual one, returning only documents that literally contain the words "beach" and "vacation."

The architectural mandate is clear: we need a system that does both. A hybrid search system leverages two distinct retrieval paths—one semantic, one lexical—and intelligently fuses the results. The challenge, however, lies in the implementation details. How do you merge two result sets ranked by fundamentally incompatible scoring mechanisms? A cosine similarity score from pgvector (typically [0, 1]) and a ts_rank score from FTS (an arbitrary positive float) cannot be naively added or compared.

This is where Reciprocal Rank Fusion (RRF) comes in. It's a simple, yet remarkably effective, rank-based fusion algorithm that obviates the need for complex score normalization. This article provides a deep, implementation-focused guide to building such a system entirely within PostgreSQL, leveraging the pgvector extension and native FTS capabilities.

We will cover:

  • Schema and Indexing: Designing a table to support both search types and creating highly-performant HNSW and GIN indexes.
  • The Fusion Algorithm: A detailed look at Reciprocal Rank Fusion and why it's superior to score-based methods for this use case.
  • The Hybrid Query: Constructing a complex but performant SQL query using Common Table Expressions (CTEs) to fetch, rank, and fuse results.
  • Production-Ready Abstraction: Encapsulating the logic in a PL/pgSQL function for reusability and maintainability.
  • Performance Tuning & Edge Cases: Analyzing query plans and addressing real-world complexities like zero-result sets and result weighting.

  • Section 1: System Architecture & Schema Setup

    Our foundation is a single PostgreSQL table designed to store product data, its vector embedding, and its pre-computed tsvector for full-text search. This co-location is a key advantage of using Postgres, simplifying data management and ensuring consistency.

    Let's define a realistic products table. We'll use a 384-dimension vector, which is standard for high-performance embedding models like sentence-transformers/all-MiniLM-L6-v2.

    sql
    -- Ensure the pgvector extension is installed
    CREATE EXTENSION IF NOT EXISTS vector;
    
    -- Create the products table
    CREATE TABLE products (
        id BIGSERIAL PRIMARY KEY,
        sku TEXT UNIQUE NOT NULL,
        name TEXT NOT NULL,
        description TEXT,
        created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
        -- Column for vector embeddings
        embedding VECTOR(384),
    
        -- Column for full-text search document
        fts_document TSVECTOR
    );
    
    -- Add comments for clarity
    COMMENT ON COLUMN products.embedding IS '384-dimension embedding from all-MiniLM-L6-v2';
    COMMENT ON COLUMN products.fts_document IS 'Pre-computed tsvector for full-text search';

    Key Architectural Decisions:

    * embedding VECTOR(384): We explicitly define the vector dimension. This is a hard requirement for pgvector and ensures data integrity.

    * fts_document TSVECTOR: We pre-materialize the tsvector. Calculating it on-the-fly with to_tsvector() in the WHERE clause of a query would be catastrophically slow as it would prevent index usage. This column will hold the processed text, stripped of stop words and stemmed.


    Section 2: Populating and Indexing for Dual-Query Paths

    A robust hybrid search system depends on two highly optimized indexes: one for lexical search and one for semantic search. Let's populate our table and build them.

    Populating the Data

    First, we'll insert some sample data. In a real application, the embedding and fts_document would be generated upon record creation or update.

    sql
    INSERT INTO products (sku, name, description)
    VALUES
        ('HW-USBC-C-1M', 'USB-C to USB-C Cable (1m)', 'A durable, 1-meter long USB-C cable for fast charging and data transfer.'),
        ('HW-USBC-L-2M', 'USB-C to Lightning Cable (2m)', 'A 2-meter MFi-certified cable to connect your iPhone to modern laptops.'),
        ('SW-PHOTO-PRO', 'PhotoMaster Pro', 'Professional photo editing software with advanced AI features and layer support.'),
        ('SW-VIDEO-SUITE', 'VideoSuite Ultimate', 'An all-in-one video editing package for creators, including effects and color grading tools.'),
        ('ACC-BTN-BLUE', 'Bluetooth Noise-Cancelling Headphones', 'Over-ear wireless headphones with active noise cancellation and 30-hour battery life.'),
        ('ACC-MOUSE-ERG', 'Ergonomic Vertical Mouse', 'A wireless mouse designed to reduce wrist strain during long work sessions.');

    The Lexical Path: BM25 Approximation with FTS

    PostgreSQL's FTS ranking functions, like ts_rank_cd, are based on the Okapi BM25 algorithm, making it a powerful tool for lexical search. Our strategy is to pre-calculate the tsvector and build a GIN index on it.

    1. Generating the tsvector:

    We'll combine the name and description fields, giving more weight to the name using setweight.

    sql
    UPDATE products
    SET fts_document = 
        setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(description, '')), 'B');

    For production systems, this logic should be encapsulated in a trigger to automatically update fts_document whenever name or description changes.

    sql
    CREATE OR REPLACE FUNCTION update_fts_document() RETURNS TRIGGER AS $$
    BEGIN
        NEW.fts_document := 
            setweight(to_tsvector('english', coalesce(NEW.name, '')), 'A') ||
            setweight(to_tsvector('english', coalesce(NEW.description, '')), 'B');
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    CREATE TRIGGER products_fts_update
    BEFORE INSERT OR UPDATE ON products
    FOR EACH ROW EXECUTE FUNCTION update_fts_document();

    2. Creating the GIN Index:

    A GIN (Generalized Inverted Index) is the standard choice for FTS in Postgres. It's highly efficient for finding rows that contain specific lexemes.

    sql
    CREATE INDEX idx_products_fts_document ON products USING GIN(fts_document);

    The Semantic Path: Vector Search with `pgvector`

    1. Generating Embeddings:

    This step happens outside the database. A Python service would use a model to convert product text into vectors and then update the database.

    python
    # Example Python script (conceptual)
    from sentence_transformers import SentenceTransformer
    import psycopg2
    
    # Assume db connection is set up
    cursor = conn.cursor()
    
    model = SentenceTransformer('all-MiniLM-L6-v2')
    
    cursor.execute("SELECT id, name, description FROM products WHERE embedding IS NULL")
    products_to_embed = cursor.fetchall()
    
    for product_id, name, description in products_to_embed:
        text_to_embed = f"{name}: {description}"
        embedding = model.encode(text_to_embed).tolist()
        
        # psycopg2 can handle lists of floats for vector types
        cursor.execute(
            "UPDATE products SET embedding = %s WHERE id = %s",
            (embedding, product_id)
        )
    
    conn.commit()
    cursor.close()

    2. Creating the HNSW Index:

    For vector search, we need an Approximate Nearest Neighbor (ANN) index. pgvector supports IVFFlat and HNSW. HNSW (Hierarchical Navigable Small World) is generally the superior choice for production workloads, offering better query performance at the cost of longer build times and higher memory usage.

    We'll index for cosine distance (vector_cosine_ops), as it's a common choice for sentence transformer models.

    sql
    -- HNSW index creation
    -- m: max number of connections per layer (default 16)
    -- ef_construction: size of the dynamic candidate list for constructing the graph (default 64)
    CREATE INDEX idx_products_embedding_hnsw 
    ON products 
    USING HNSW (embedding vector_cosine_ops);

    HNSW vs. IVFFlat - A Senior-Level Consideration:

    * IVFFlat: Partitions the vector space into lists. Search speed depends on how many lists (probes) you check. Good for static datasets, but recall can suffer if a query vector is near a partition boundary. Rebuilding is expensive.

    * HNSW: Builds a multi-layered graph of vectors. Search involves traversing the graph. Generally provides better speed/recall trade-offs than IVFFlat, especially on complex data distributions. It's the go-to for most modern ANN applications.


    Section 3: The Core Challenge: Fusing Disparate Ranks with RRF

    We now have two independent, indexed search systems. A query for "editing program" might return (SW-PHOTO-PRO, SW-VIDEO-SUITE) from the semantic search, while "VideoSuite" returns (SW-VIDEO-SUITE) from the lexical search. How do we combine them?

    As mentioned, their scores are incompatible. The solution is to ignore the scores and use only the rank. Reciprocal Rank Fusion (RRF) provides a simple, parameter-free formula to do this.

    For a given document d, its RRF score is calculated as:

    RRF_Score(d) = Σ (1 / (k + rank_i(d)))

    Where:

    * rank_i(d) is the rank of document d in result set i.

    * k is a constant that mitigates the influence of highly-ranked results. A standard value is k=60.

    Example:

    Assume k=60. Document A is rank 1 in semantic search and rank 10 in lexical search. Document B is rank 5 in semantic search and rank 2 in lexical search.

    * RRF(A) = (1 / (60 + 1)) + (1 / (60 + 10)) = 0.01639 + 0.01428 = 0.03067

    * RRF(B) = (1 / (60 + 5)) + (1 / (60 + 2)) = 0.01538 + 0.01612 = 0.03150

    Despite Document A having a #1 rank, Document B's consistently high ranks give it a better final RRF score. This demonstrates RRF's power in balancing results from multiple systems.


    Section 4: Implementation: Crafting the Hybrid Search Query

    We will use Common Table Expressions (CTEs) to construct our hybrid query. This approach keeps the logic clean and readable.

    The query takes two inputs: a query_text for FTS and a query_embedding for vector search.

    sql
    -- Define query parameters for the example
    -- In a real app, these would be passed as parameters
    WITH query AS (
        SELECT 
            'fast charging cable' AS text,
            -- This would be the embedding for 'fast charging cable'
            '[...]'::vector(384) AS embedding 
    ),
    
    -- 1. Perform Lexical (FTS) Search
    lexical_search AS (
        SELECT 
            id,
            -- Calculate the rank score. We use ts_rank_cd for its normalization.
            ts_rank_cd(fts_document, websearch_to_tsquery('english', (SELECT text FROM query))) as rank_score
        FROM products
        -- The @@ operator uses the GIN index
        WHERE fts_document @@ websearch_to_tsquery('english', (SELECT text FROM query))
        ORDER BY rank_score DESC
        -- Critical: Limit results to prevent joining large sets
        LIMIT 100
    ),
    
    -- 2. Perform Semantic (Vector) Search
    semantic_search AS (
        SELECT 
            id,
            -- The <=> operator is cosine distance, so 1 - distance = similarity
            1 - (embedding <=> (SELECT embedding FROM query)) as similarity_score
        FROM products
        -- Order by cosine distance (ascending). The HNSW index is used here.
        ORDER BY embedding <=> (SELECT embedding FROM query)
        -- Critical: Limit results
        LIMIT 100
    ),
    
    -- 3. Rank the results from each search path independently
    lexical_ranked AS (
        SELECT id, ROW_NUMBER() OVER (ORDER BY rank_score DESC) as rank
        FROM lexical_search
    ),
    
    semantic_ranked AS (
        SELECT id, ROW_NUMBER() OVER (ORDER BY similarity_score DESC) as rank
        FROM semantic_search
    ),
    
    -- 4. Fuse the results using Reciprocal Rank Fusion (RRF)
    fused_results AS (
        SELECT
            -- Coalesce to handle documents present in only one result set
            COALESCE(lr.id, sr.id) as id,
            -- Calculate RRF score. If a doc is not in a result set, its rank is effectively infinity,
            -- so it contributes 0 to the score. We use a k of 60.
            (COALESCE(1.0 / (60 + lr.rank), 0.0)) + 
            (COALESCE(1.0 / (60 + sr.rank), 0.0)) as rrf_score
        FROM lexical_ranked lr
        -- FULL OUTER JOIN is essential to include documents found by only one search method
        FULL OUTER JOIN semantic_ranked sr ON lr.id = sr.id
    )
    
    -- 5. Final Selection and Ordering
    SELECT 
        p.id,
        p.sku,
        p.name,
        p.description,
        fr.rrf_score
    FROM fused_results fr
    JOIN products p ON fr.id = p.id
    ORDER BY fr.rrf_score DESC
    LIMIT 20;

    Dissecting the Query:

    * lexical_search & semantic_search CTEs: These perform the two searches independently. Both make use of their respective indexes. The LIMIT 100 is a crucial performance optimization. Without it, the FULL OUTER JOIN could be joining millions of rows.

    * lexical_ranked & semantic_ranked CTEs: We discard the original scores (ts_rank_cd, similarity_score) and replace them with a simple integer rank. This is the core principle of RRF.

    fused_results CTE: This is where the magic happens. A FULL OUTER JOIN ensures that a document appearing in either* result set is included. COALESCE is used to calculate the RRF score, with a rank from a missing set contributing 0.0.

    * Final SELECT: We join back to the products table to retrieve the full document data and order by our calculated rrf_score.


    Section 5: Production-Ready Abstraction with PL/pgSQL

    Exposing that complex CTE query to your application layer is a recipe for disaster. It's brittle and hard to maintain. The standard production pattern is to encapsulate this logic in a PL/pgSQL function.

    This function provides a clean interface, can have its execution plan cached by Postgres, and allows for more complex logic, like adding weights.

    sql
    CREATE OR REPLACE FUNCTION hybrid_search(
        p_query_text TEXT,
        p_query_embedding VECTOR(384),
        p_match_limit INT DEFAULT 20,
        p_lexical_weight FLOAT DEFAULT 1.0,
        p_semantic_weight FLOAT DEFAULT 1.0,
        p_rrf_k INT DEFAULT 60
    ) 
    RETURNS TABLE ( 
        id BIGINT,
        sku TEXT,
        name TEXT,
        description TEXT,
        rrf_score FLOAT
    )
    AS $$
    BEGIN
        RETURN QUERY
        WITH lexical_search AS (
            SELECT p.id, ts_rank_cd(p.fts_document, websearch_to_tsquery('english', p_query_text)) as rank_score
            FROM products p
            WHERE p.fts_document @@ websearch_to_tsquery('english', p_query_text)
            ORDER BY rank_score DESC
            LIMIT 100
        ),
        semantic_search AS (
            SELECT p.id, 1 - (p.embedding <=> p_query_embedding) as similarity_score
            FROM products p
            ORDER BY p.embedding <=> p_query_embedding
            LIMIT 100
        ),
        lexical_ranked AS (
            SELECT ls.id, ROW_NUMBER() OVER (ORDER BY ls.rank_score DESC) as rank
            FROM lexical_search ls
        ),
        semantic_ranked AS (
            SELECT ss.id, ROW_NUMBER() OVER (ORDER BY ss.similarity_score DESC) as rank
            FROM semantic_search ss
        ),
        fused_results AS (
            SELECT
                COALESCE(lr.id, sr.id) as product_id,
                (
                    (p_lexical_weight * COALESCE(1.0 / (p_rrf_k + lr.rank), 0.0)) + 
                    (p_semantic_weight * COALESCE(1.0 / (p_rrf_k + sr.rank), 0.0))
                ) as calculated_rrf_score
            FROM lexical_ranked lr
            FULL OUTER JOIN semantic_ranked sr ON lr.id = sr.id
        )
        SELECT 
            p.id,
            p.sku,
            p.name,
            p.description,
            fr.calculated_rrf_score::FLOAT
        FROM fused_results fr
        JOIN products p ON fr.product_id = p.id
        ORDER BY fr.calculated_rrf_score DESC
        LIMIT p_match_limit;
    END;
    $$ LANGUAGE plpgsql;

    Now, your application can simply call:

    sql
    SELECT * FROM hybrid_search('fast charging cable', '[...]'::vector, p_semantic_weight := 1.2);

    This is a much more robust and maintainable pattern. We've also added weighting parameters (p_lexical_weight, p_semantic_weight) to allow for tuning the search behavior at runtime.


    Section 6: Performance Analysis and Optimization

    Performance is paramount. Let's analyze the query plan for our function.

    EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM hybrid_search('cable', '[...]'::vector);

    You would expect to see a plan with these key features:

    text
    -> Limit (cost=...)
      -> Sort (cost=...)
        -> Nested Loop
          -> Hash Full Join
            -> CTE Scan on lexical_ranked
              -> CTE Scan on lexical_search
                -> Limit
                  -> Sort
                    -> Bitmap Heap Scan on products
                      -> Bitmap Index Scan on idx_products_fts_document
            -> CTE Scan on semantic_ranked
              -> CTE Scan on semantic_search
                -> Limit
                  -> Index Scan using idx_products_embedding_hnsw on products
          -> Index Scan using products_pkey on products

    Key Takeaways from the Plan:

  • Index Usage: The plan confirms that idx_products_fts_document (a GIN index scan) and idx_products_embedding_hnsw (an HNSW index scan) are being used independently within their respective CTEs. This is exactly what we want.
  • Pre-Join Limits: The Limit operations occur before the Hash Full Join. This is the single most important factor for performance. We are joining two sets of at most 100 rows each, not the entire table.
  • The Join: A Hash Full Join is efficient for this operation.
  • Tuning `pgvector` Performance

    The primary knob for tuning pgvector's HNSW index at query time is hnsw.ef_search.

    * ef_search (default 100): The size of the dynamic candidate list during search. A higher value increases accuracy (recall) at the cost of speed. A lower value is faster but may miss some nearest neighbors.

    You can set this for a specific transaction:

    sql
    BEGIN;
    SET LOCAL hnsw.ef_search = 200;
    SELECT * FROM hybrid_search('query', '[...]'::vector);
    COMMIT;

    Benchmarking is essential. Run queries with varying ef_search values and measure both latency and recall against a ground-truth dataset to find the optimal balance for your application's specific needs.


    Section 7: Edge Cases and Advanced Considerations

    * The "Zero Results" Problem: What happens if the lexical search for "asdfghjkl" returns nothing? The FULL OUTER JOIN handles this perfectly. The lexical_ranked CTE will be empty, but the semantic_ranked CTE will still have results. In the fusion CTE, lr.rank will be NULL for all rows, contributing 0 to the RRF score, and the final ranking will be based purely on the semantic results. This resilience is a major strength of this architecture.

    * Multi-tenancy: In a multi-tenant application, you would add a tenant_id column to the products table. To maintain performance, this tenant_id must be part of your indexes.

    * For the FTS index: CREATE INDEX ... ON products USING GIN(tenant_id, fts_document);

    * For the HNSW index, you must use partitioning or create a composite index if your access patterns allow. A common pattern is to add tenant_id to every WHERE clause: WHERE tenant_id = 'some-tenant' AND fts_document @@ ....

    * Real-time Indexing Latency: The trigger-based approach for updating fts_document is synchronous and adds a small amount of latency to INSERT and UPDATE operations. For extremely high-throughput systems, you might offload the embedding generation and tsvector calculation to a background job queue (e.g., RabbitMQ, Kafka, PGQ). The application would write the base data, publish a message, and a worker would later update the embedding and fts_document columns. This creates an eventual consistency model, which is a common trade-off in complex search systems.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles