Hybrid Search in Postgres: Combining pgvector & pg_bm25 with RRF

18 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Dichotomy of Modern Search: Semantic vs. Keyword

In modern search systems, engineers face a fundamental trade-off. On one hand, dense vector search, powered by embeddings from models like BERT or sentence-transformers, delivers remarkable semantic understanding. It can find documents that are conceptually similar, even if they don't share any keywords. A search for "summer clothes" might correctly surface "sun dresses" and "linen shorts." However, this approach can falter on precision, especially with rare keywords, SKUs, or specific identifiers. The vector representation might average out the importance of a critical term like RTX-4090-OC, leading to less relevant results.

On the other hand, traditional lexical search, perfected by algorithms like BM25, excels at keyword matching. It's fast, precise, and gives users exactly what they ask for when they know the specific terms. But it's brittle. It has no conceptual grasp of the world; "summer clothes" and "warm weather apparel" are entirely different queries to a BM25-powered system unless extensive synonym lists are manually maintained.

The state-of-the-art solution is not to choose one, but to combine both. Hybrid search leverages the strengths of each paradigm, delivering results that are both semantically relevant and keyword-aware. Historically, this required complex architectures involving a primary database (like PostgreSQL), a dedicated search engine (like Elasticsearch), and a vector database (like Pinecone or Weaviate), leading to significant data synchronization challenges and operational overhead.

This article details a powerful, modern alternative: implementing a complete, production-grade hybrid search system entirely within PostgreSQL. By leveraging the pgvector extension for dense vector search and the pg_bm25 extension for advanced sparse vector (keyword) search, we can build a unified, high-performance system. The crux of the challenge, and the focus of this article, is how to intelligently fuse the results from these two disparate systems. We will implement Reciprocal Rank Fusion (RRF), a simple yet profoundly effective technique for combining ranked lists without needing to normalize incomparable scores.

Prerequisites

This guide assumes you are a senior engineer with a strong understanding of:

  • PostgreSQL administration and advanced SQL (CTEs, window functions).
  • The basic concepts of vector embeddings and cosine similarity.
  • The fundamentals of full-text search (e.g., tsvector in Postgres).
  • We will not cover the basics of what a vector is or how to install PostgreSQL. We dive directly into the architecture and implementation.


    System Architecture & Schema Setup

    Our goal is to create a single table that can efficiently serve both dense and sparse search queries. Let's model a product catalog for an e-commerce site. Each product has a title, description, and other metadata.

    First, ensure the necessary extensions are enabled in your database:

    sql
    -- Enable vector support for embeddings
    CREATE EXTENSION IF NOT EXISTS vector;
    
    -- Enable Okapi BM25 ranking functions for FTS
    CREATE EXTENSION IF NOT EXISTS pg_bm25;

    Now, let's define our products table. The design is critical: it must contain columns for primary data, a vector for the dense embedding, and a tsvector for the sparse representation.

    sql
    CREATE TABLE products (
        id SERIAL PRIMARY KEY,
        sku VARCHAR(20) UNIQUE NOT NULL,
        title TEXT NOT NULL,
        description TEXT,
        category VARCHAR(100),
        created_at TIMESTAMPTZ DEFAULT NOW(),
    
        -- Column for dense vector embeddings from a model
        -- The dimension (e.g., 384) depends on the embedding model used.
        embedding VECTOR(384),
    
        -- Column for sparse vector representation for keyword search
        -- This is the standard Postgres full-text search type.
        fts_document TSVECTOR
    );

    Data Ingestion Strategy

    Populating the embedding and fts_document columns is a critical ETL (Extract, Transform, Load) step. While a real-time system might use message queues and workers, we'll focus on the core logic. A common pattern is to use a trigger to automatically update the fts_document whenever the text content changes.

    sql
    -- Create a function to update the fts_document
    CREATE OR REPLACE FUNCTION update_fts_document() RETURNS TRIGGER AS $$
    BEGIN
        -- Concatenate relevant text fields and convert to a tsvector
        -- 'A', 'B', 'C', 'D' are weights for title, description, sku, category respectively
        NEW.fts_document := 
            setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
            setweight(to_tsvector('english', coalesce(NEW.description, '')), 'B') ||
            setweight(to_tsvector('english', coalesce(NEW.sku, '')), 'C') ||
            setweight(to_tsvector('english', coalesce(NEW.category, '')), 'D');
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    -- Create a trigger to call the function on insert or update
    CREATE TRIGGER products_fts_update_trigger
    BEFORE INSERT OR UPDATE ON products
    FOR EACH ROW EXECUTE FUNCTION update_fts_document();

    The embedding column must be populated externally, as it requires a machine learning model. Here's a conceptual Python script using the sentence-transformers library to generate and insert the data.

    python
    import psycopg2
    from sentence_transformers import SentenceTransformer
    
    # --- Configuration ---
    DB_CONN_STRING = "postgresql://user:password@host:port/dbname"
    MODEL_NAME = 'all-MiniLM-L6-v2' # 384 dimensions
    
    # --- Sample Data ---
    products_data = [
        {"sku": "SNK-001", "title": "Men's Running Shoes", "description": "Lightweight sneakers for daily runs and marathon training.", "category": "Footwear"},
        {"sku": "TSH-004", "title": "Organic Cotton T-Shirt", "description": "A soft, breathable crewneck shirt made from 100% organic cotton.", "category": "Apparel"},
        {"sku": "JNS-002", "title": "Slim-Fit Denim Jeans", "description": "Classic five-pocket styling in a modern slim fit. Stretch denim for comfort.", "category": "Apparel"},
        {"sku": "WTR-010", "title": "Insulated Water Bottle", "description": "Stainless steel bottle that keeps drinks cold for 24 hours or hot for 12.", "category": "Accessories"}
    ]
    
    # --- Main Logic ---
    def main():
        print(f"Loading sentence transformer model: {MODEL_NAME}")
        model = SentenceTransformer(MODEL_NAME)
    
        conn = psycopg2.connect(DB_CONN_STRING)
        cur = conn.cursor()
    
        print("Generating embeddings and inserting data...")
        for product in products_data:
            # The trigger will handle fts_document automatically
            # We only need to generate the dense embedding
            text_to_embed = f"{product['title']} {product['description']}"
            embedding = model.encode(text_to_embed).tolist()
    
            cur.execute(
                """INSERT INTO products (sku, title, description, category, embedding) 
                   VALUES (%s, %s, %s, %s, %s)""",
                (product['sku'], product['title'], product['description'], product['category'], embedding)
            )
        
        conn.commit()
        cur.close()
        conn.close()
        print("Data insertion complete.")
    
    if __name__ == "__main__":
        main()

    This setup provides a solid foundation. Every row now contains the necessary data structures for both search paradigms.


    Indexing Strategies for Production Performance

    Unindexed queries on our embedding and fts_document columns would result in sequential scans, which are unacceptably slow for any real-world dataset. Correct indexing is non-negotiable.

    `pg_bm25` Indexing (Sparse)

    This is the more straightforward of the two. pg_bm25 leverages PostgreSQL's built-in Full-Text Search capabilities, which are best served by a GIN (Generalized Inverted Index).

    sql
    -- Create a GIN index on the tsvector column for fast keyword search
    CREATE INDEX idx_products_fts ON products USING GIN(fts_document);

    A GIN index creates an entry for each unique lexeme (word) and points to all the documents containing it. This makes finding documents that match a set of keywords extremely fast.

    `pgvector` Indexing (Dense)

    Indexing for Approximate Nearest Neighbor (ANN) search is more complex. pgvector supports two main index types: IVFFlat and HNSW. Choosing between them involves a trade-off between build time, index size, and query performance.

    1. IVFFlat (Inverted File with Flat Compression)

  • How it works: IVFFlat works by clustering your vectors into lists. At query time, it searches only in the most promising probes (a subset of lists) closest to your query vector, rather than the entire dataset.
  • Pros: Faster to build than HNSW.
  • Cons: Query performance can be lower. Finding the optimal lists and probes parameters requires tuning.
  • sql
    -- Create an IVFFlat index
    -- The number of lists should be sqrt(N) for up to 1M rows, and N/1000 for larger datasets.
    -- Assuming we have 100,000 products:
    CREATE INDEX idx_products_embedding_ivfflat 
    ON products 
    USING ivfflat (embedding vector_cosine_ops) 
    WITH (lists = 316);

    At query time, you must set the probes parameter. A higher value increases accuracy (recall) at the cost of latency.

    sql
    -- Set before running your query
    SET ivfflat.probes = 10;
    SELECT id, title FROM products ORDER BY embedding <=> '[...]' LIMIT 10;

    2. HNSW (Hierarchical Navigable Small World)

  • How it works: HNSW builds a multi-layered graph of vectors. Searches start at a coarse top layer and navigate down to finer-grained layers, efficiently homing in on the nearest neighbors.
  • Pros: Generally provides better query performance (speed-recall trade-off) than IVFFlat, especially on large, high-dimensional datasets.
  • Cons: Slower to build and consumes more memory.
  • sql
    -- Create an HNSW index
    -- m: max number of connections per layer (default 16)
    -- ef_construction: size of the dynamic list for neighbors during build (default 64)
    CREATE INDEX idx_products_embedding_hnsw
    ON products
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

    Similar to probes in IVFFlat, HNSW has a query-time parameter ef_search that controls the trade-off between speed and accuracy.

    sql
    -- Set before running your query
    SET hnsw.ef_search = 40;
    SELECT id, title FROM products ORDER BY embedding <=> '[...]' LIMIT 10;

    Production Recommendation: For most modern, latency-sensitive applications, HNSW is the preferred choice. While the build time is higher, the superior query performance is usually worth the trade-off. Start with the default m and ef_construction values and tune ef_search at query time to meet your application's specific latency and recall requirements.


    The Fusion Problem: Combining Two Worlds

    With our data ingested and indexed, we can now execute two separate, efficient queries.

    Dense Vector Search Query:

    sql
    -- Assume query_embedding is the 384-dimensional vector for "warm weather apparel"
    SELECT id, title, 1 - (embedding <=> query_embedding) AS similarity_score
    FROM products
    ORDER BY embedding <=> query_embedding
    LIMIT 20;

    Note: The <=> operator calculates cosine distance. We use 1 - distance to get cosine similarity.

    Sparse BM25 Search Query:

    sql
    -- Assume query_tsquery is to_tsquery('english', 'warm & weather & apparel')
    SELECT p.id, p.title, bm25.bm25(p.fts_document, query_tsquery) AS relevance_score
    FROM products p, 
         to_tsquery('english', 'warm & weather & apparel') query_tsquery,
         (SELECT avg(pg_column_size(fts_document)) FROM products) AS avg_doc_length,
         (SELECT count(*) FROM products) AS doc_count
    WHERE p.fts_document @@ query_tsquery
    ORDER BY relevance_score DESC
    LIMIT 20;

    Note: The pg_bm25 extension does not yet have a built-in bm25 ranking function, this is a placeholder for the logic you'd implement or a future version. A common workaround is to use ts_rank or manually calculate BM25 in a PL/pgSQL function. For simplicity in this article, we'll assume a bm25_rank function exists that encapsulates this logic.

    We now have two ranked lists of product IDs. The core problem is that their scores are on different, incomparable scales. A similarity_score of 0.85 from pgvector has no direct relationship to a relevance_score of 12.3 from BM25. A simple weighted sum (w1 score1) + (w2 score2) is extremely difficult to tune and maintain because the score distributions can change with the data.

    This is where Reciprocal Rank Fusion (RRF) provides an elegant solution.


    The Solution: Reciprocal Rank Fusion (RRF)

    RRF is a rank-based fusion method, meaning it completely ignores the original scores and only considers the position of each document in the ranked lists. This makes it robust and eliminates the need for score normalization.

    The formula is simple: for each document d, its RRF score is the sum of the reciprocal of its rank in each list.

    RRF_Score(d) = Σ (1 / (k + rank_i(d)))

  • rank_i(d) is the rank of document d in result set i.
  • k is a constant that mitigates the effect of high ranks (e.g., positions 1 and 2) having an outsized impact. A common value for k is 60.
  • We can implement this directly in SQL using Common Table Expressions (CTEs).

    The Master Hybrid Search Query

    Here is a complete, production-ready query that performs dense search, sparse search, and fuses the results using RRF.

    sql
    -- Hybrid Search Query with RRF
    -- :query_text = 'warm weather apparel'
    -- :query_embedding = [0.1, 0.2, ...]
    -- :k = 60 (RRF constant)
    
    WITH dense_results AS (
        -- 1. Perform dense vector search
        SELECT 
            id,
            -- Use row_number() to get the rank, starting from 1
            row_number() OVER (ORDER BY embedding <=> :query_embedding) AS rank
        FROM products
        -- Add a WHERE clause for pre-filtering if needed, e.g., WHERE category = 'Apparel'
        ORDER BY embedding <=> :query_embedding
        LIMIT 100 -- Fetch more results than needed to allow for fusion
    ), 
    sparse_results AS (
        -- 2. Perform sparse BM25 search
        SELECT 
            id,
            row_number() OVER (ORDER BY ts_rank_cd(fts_document, to_tsquery('english', :query_text)) DESC) AS rank
        FROM products
        WHERE fts_document @@ to_tsquery('english', :query_text)
        ORDER BY ts_rank_cd(fts_document, to_tsquery('english', :query_text)) DESC
        LIMIT 100
    ), 
    -- We use ts_rank_cd here as a stand-in for BM25 ranking for simplicity.
    -- A custom pg_bm25 function would be used in a real implementation.
    
    -- 3. Fuse the results using Reciprocal Rank Fusion (RRF)
    fused_results AS (
        SELECT id, 1.0 / (:k + rank) as rrf_score FROM dense_results
        UNION ALL
        SELECT id, 1.0 / (:k + rank) as rrf_score FROM sparse_results
    )
    
    -- 4. Aggregate the scores and rank the final results
    SELECT 
        p.id,
        p.title,
        p.sku,
        SUM(fr.rrf_score) AS final_score
    FROM fused_results fr
    JOIN products p ON fr.id = p.id
    GROUP BY p.id, p.title, p.sku
    ORDER BY final_score DESC
    LIMIT 20;

    Breakdown of the Query:

  • dense_results CTE: Executes the vector search. We use the row_number() window function to assign a rank to each result based on its cosine distance. We fetch a larger limit (e.g., 100) to provide a rich set of candidates for fusion.
  • sparse_results CTE: Executes the keyword search using ts_rank_cd for ranking and assigns a rank similarly.
  • fused_results CTE: This is the heart of RRF. We UNION ALL the two result sets. For each entry, we calculate its RRF contribution 1 / (k + rank). A document appearing in both lists will have two rows in this CTE.
  • Final Aggregation: We GROUP BY the product ID, SUM the rrf_score for each document, and order by this final aggregated score to get our hybrid-ranked list.
  • This single SQL query is declarative, optimizable by the PostgreSQL query planner, and contains the entire hybrid search logic, eliminating the need for application-layer merging.


    Advanced Considerations and Edge Cases

    While the above query is powerful, a production system requires attention to detail.

    Performance Tuning with `EXPLAIN ANALYZE`

    Always run EXPLAIN ANALYZE on your final query. You should see:

  • An Index Scan or Bitmap Index Scan on your HNSW/IVFFlat index in the dense_results CTE.
  • An Index Scan or Bitmap Index Scan on your GIN index in the sparse_results CTE.
  • If you see a Sequential Scan, your indexes are not being used, and performance will be poor. This could be due to a type mismatch, an incorrect operator, or stale table statistics (ANALYZE products;).

    Weighting Dense vs. Sparse Results

    RRF treats both result sets equally. What if you want to prioritize keyword matches? You can introduce a weighting factor into the fusion step.

    sql
    -- Modified fused_results CTE for weighted RRF
    -- :dense_weight = 0.5
    -- :sparse_weight = 1.5
    
    fused_results AS (
        SELECT id, :dense_weight * (1.0 / (:k + rank)) as rrf_score FROM dense_results
        UNION ALL
        SELECT id, :sparse_weight * (1.0 / (:k + rank)) as rrf_score FROM sparse_results
    )

    By multiplying the RRF contribution, you can bias the final ranking. A sparse_weight of 1.5 would give more influence to documents that rank highly in the keyword search.

    Handling No Results

    The UNION ALL approach is robust. If one of the CTEs returns zero rows (e.g., no keyword match), the fusion logic still works correctly—it will simply consist of the results from the other search.

    Pre-filtering vs. Post-filtering

    Consider a query like "running shoes for men". You can implement the "for men" filter in two ways:

  • Pre-filtering (more performant): Add a WHERE clause inside the dense_results and sparse_results CTEs (e.g., WHERE gender = 'male'). This reduces the search space before the expensive ranking and fusion steps.
  • Post-filtering: Apply the WHERE clause after the final aggregation. This is less efficient as you perform the search on the entire dataset first.
  • Always use pre-filtering whenever possible. This is particularly effective if you have an index on the filter column (e.g., category, gender).


    Full, Runnable Implementation Example

    To make this concrete, here is a docker-compose.yml to spin up a PostgreSQL instance with the necessary extensions, using the pgvector/pgvector image.

    docker-compose.yml

    yaml
    version: '3.8'
    services:
      postgres:
        image: pgvector/pgvector:pg16
        container_name: postgres_hybrid_search
        environment:
          - POSTGRES_USER=testuser
          - POSTG-RES_PASSWORD=testpass
          - POSTGRES_DB=testdb
        ports:
          - "5432:5432"
        volumes:
          - postgres_data:/var/lib/postgresql/data
    
    volumes:
      postgres_data:

    Note: The pgvector/pgvector image includes pgvector. You would need to add pg_bm25 manually or find an image that includes both. For this example, we will use ts_rank_cd as a proxy for BM25, which is available by default.

    A Python script can then orchestrate the entire process: setup, data loading, indexing, and querying.

    python
    # [Include the Python script from Section 2 for setup]
    
    def hybrid_search(query_text: str, k: int = 60, dense_weight: float = 1.0, sparse_weight: float = 1.0, limit: int = 20):
        model = SentenceTransformer('all-MiniLM-L6-v2')
        query_embedding = model.encode(query_text).tolist()
    
        # Prepare the query text for to_tsquery by replacing spaces with '&'
        tsquery_text = ' & '.join(query_text.strip().split())
    
        conn = psycopg2.connect(DB_CONN_STRING)
        cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
    
        sql_query = """
        WITH dense_results AS (
            SELECT id, row_number() OVER (ORDER BY embedding <=> %(embedding)s) AS rank
            FROM products
            LIMIT 100
        ), 
        sparse_results AS (
            SELECT id, row_number() OVER (ORDER BY ts_rank_cd(fts_document, to_tsquery('english', %(tsquery)s)) DESC) AS rank
            FROM products
            WHERE fts_document @@ to_tsquery('english', %(tsquery)s)
            LIMIT 100
        ),
        fused_results AS (
            SELECT id, %(dense_weight)s * (1.0 / (%(k)s + rank)) as rrf_score FROM dense_results
            UNION ALL
            SELECT id, %(sparse_weight)s * (1.0 / (%(k)s + rank)) as rrf_score FROM sparse_results
        )
        SELECT 
            p.id,
            p.title,
            p.sku,
            SUM(fr.rrf_score) AS final_score
        FROM fused_results fr
        JOIN products p ON fr.id = p.id
        GROUP BY p.id, p.title, p.sku
        ORDER BY final_score DESC
        LIMIT %(limit)s;
        """
    
        params = {
            'embedding': str(query_embedding),
            'tsquery': tsquery_text,
            'k': k,
            'dense_weight': dense_weight,
            'sparse_weight': sparse_weight,
            'limit': limit
        }
    
        cur.execute(sql_query, params)
        results = cur.fetchall()
    
        cur.close()
        conn.close()
    
        return results
    
    # --- Example Usage ---
    if __name__ == "__main__":
        # Run the setup and data insertion first...
        # main()
    
        search_query = "soft cotton shirt"
        print(f"Executing hybrid search for: '{search_query}'")
        search_results = hybrid_search(search_query)
    
        for row in search_results:
            print(f"ID: {row['id']}, SKU: {row['sku']}, Title: {row['title']}, Score: {row['final_score']:.4f}")

    This runnable example provides a complete, end-to-end blueprint for implementing this advanced search pattern.

    Conclusion

    By moving beyond the false choice between semantic and keyword search, you can build a vastly superior user experience. This guide has demonstrated that a sophisticated hybrid search system is not only possible but practical to implement directly within PostgreSQL, a database many teams already use and understand. This approach consolidates your architecture, simplifies data management, and leverages the power of mature, well-supported extensions.

    The key takeaways for production implementation are:

  • Dual-Column Schema: Use dedicated vector and tsvector columns to represent the dense and sparse aspects of your documents.
  • Robust Indexing: HNSW is the modern standard for vector indexing in pgvector, and GIN is the proven choice for tsvector columns. Proper indexing is the most critical factor for performance.
  • Rank-Based Fusion: Reciprocal Rank Fusion (RRF) is a powerful, normalization-free method for merging result sets. Implement it with CTEs for clean, optimizable SQL.
  • Tune and Measure: Use EXPLAIN ANALYZE to validate your index usage and tune query-time parameters like hnsw.ef_search and RRF weights to meet your specific relevance and latency goals.
  • By adopting this pattern, you can unlock a new level of search relevance without the operational burden of managing a complex, distributed search stack.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles