Hybrid Search in Postgres: Fusing BM25 and pgvector with RRF

19 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Dichotomy of Modern Search: Lexical vs. Semantic

In contemporary application development, search is no longer a solved problem. Senior engineers are frequently tasked with evolving simple keyword-based search into something that understands user intent. This leads to a fundamental architectural choice: do we rely on traditional lexical search, which excels at matching specific terms, or do we embrace semantic search, which understands conceptual relationships?

  • Lexical Search (Sparse Retrieval): Exemplified by algorithms like BM25, this method is fast, efficient, and precise. It's powered by inverted indexes and shines when users know exactly what terms to search for. In PostgreSQL, this is the domain of Full-Text Search (FTS).
  • Semantic Search (Dense Retrieval): This approach uses vector embeddings from deep learning models to represent the meaning of text. It finds documents that are conceptually similar, even if they don't share any keywords. The pgvector extension brings this capability to PostgreSQL.
  • The optimal solution is rarely one or the other. A user searching for "Postgres HNSW index parameters" expects documents containing those exact keywords, a task for lexical search. A user searching for "tuning vector search performance in postgres" should also find that document, a task for semantic search. The goal is to build a single system that delivers the best of both worlds. This is hybrid search.

    This article bypasses introductory concepts. We assume you understand what vector embeddings are and have a working knowledge of PostgreSQL. We will focus exclusively on a production-ready pattern for implementing hybrid search by fusing the results of FTS and pgvector using a superior, parameter-free technique: Reciprocal Rank Fusion (RRF).

    Part 1: Advanced Data Modeling for Hybrid Search

    Our foundation is a well-designed table that accommodates both search paradigms efficiently. We'll use a documents table as our example, representing a corpus of technical articles.

    The Core Schema

    Let's define a schema that not only stores our data but also pre-processes it for search using generated columns. This offloads computation from the application layer and ensures data integrity.

    sql
    -- Ensure required extensions are enabled
    CREATE EXTENSION IF NOT EXISTS vector;
    CREATE EXTENSION IF NOT EXISTS btree_gin; -- Often useful for FTS
    
    -- Our main table for technical articles
    CREATE TABLE documents (
        id bigserial PRIMARY KEY,
        title text NOT NULL,
        content text NOT NULL,
        published_at timestamptz DEFAULT NOW(),
        
        -- Column for Full-Text Search (FTS)
        -- Automatically generated from title and content
        -- We weight the title higher than the content (A > B)
        content_tsv tsvector GENERATED ALWAYS AS (
            setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
            setweight(to_tsvector('english', coalesce(content, '')), 'B')
        ) STORED,
        
        -- Column for Semantic Search (Vector Embeddings)
        -- Dimension 384 is for the 'all-MiniLM-L6-v2' model
        embedding vector(384)
    );

    Key Architectural Decisions:

  • GENERATED ALWAYS AS ... STORED: This is a critical choice for content_tsv. By making it a stored generated column, the tsvector is computed only once upon INSERT or UPDATE of title or content. This avoids re-computation on every query, a significant performance gain. The alternative, VIRTUAL, would re-calculate the vector on every read, which is unacceptable for a search workload.
  • setweight: We are already building sophistication into our lexical search. By assigning a higher weight ('A') to the title than the content ('B'), we are telling the ts_rank_cd function that matches in the title are more relevant. This is a simple but powerful way to improve ranking out of the box.
  • Vector Dimension: The dimension 384 is not arbitrary. It must exactly match the output dimension of your chosen embedding model. Using a model like SentenceTransformers' all-MiniLM-L6-v2 is a common starting point for its balance of performance and quality.
  • Generating and Populating Embeddings

    PostgreSQL won't generate embeddings for you. This step must be handled by your application layer before INSERT. Here is a Python example using the sentence-transformers and psycopg libraries.

    python
    # File: embedder.py
    import psycopg
    from sentence_transformers import SentenceTransformer
    
    # It's crucial to load the model only once and reuse it.
    # In a real application, this would be part of a singleton or service class.
    model = SentenceTransformer('all-MiniLM-L6-v2')
    
    def get_embedding(text: str) -> list[float]:
        """Generates a 384-dimension embedding for a given text."""
        return model.encode(text).tolist()
    
    def insert_document(conn, title: str, content: str):
        """Generates embedding and inserts the document into the database."""
        # Combine title and content for a more holistic embedding
        text_to_embed = f"{title}: {content}"
        embedding = get_embedding(text_to_embed)
        
        with conn.cursor() as cur:
            cur.execute(
                """INSERT INTO documents (title, content, embedding)
                   VALUES (%s, %s, %s)""",
                (title, content, embedding)
            )
        conn.commit()
    
    # --- Example Usage ---
    if __name__ == '__main__':
        # Replace with your actual connection string
        DB_CONN = "postgresql://user:pass@host:port/dbname"
        
        sample_docs = [
            {
                "title": "Optimizing PostgreSQL for Read-Heavy Workloads",
                "content": "Partial indexes and materialized views are key strategies for improving query performance in read-heavy scenarios..."
            },
            {
                "title": "An Introduction to Vector Databases",
                "content": "Vector databases use HNSW or IVFFlat indexes to perform nearest neighbor searches efficiently..."
            },
            {
                "title": "Advanced pgvector Indexing",
                "content": "When using pgvector, choosing between IVFFlat and HNSW depends on your specific needs for build time versus query recall. HNSW is generally preferred for its superior accuracy."
            }
        ]
        
        with psycopg.connect(DB_CONN) as conn:
            for doc in sample_docs:
                print(f"Inserting: {doc['title']}")
                insert_document(conn, doc['title'], doc['content'])
        print("Sample documents inserted.")
    

    Part 2: Optimizing the Dual Search Pillars

    With our data model in place, we must now create highly optimized indexes for each search type. A hybrid search query will only be as fast as its slowest component.

    Pillar A: Tuning Full-Text Search (FTS)

    The performance of lexical search hinges on the GIN index.

    sql
    -- Create a GIN index on our pre-computed tsvector column
    CREATE INDEX idx_documents_tsv ON documents USING GIN(content_tsv);

    Why GIN over GiST?

  • GIN (Generalized Inverted Index): Is optimized for queries that search for specific values within a composite type. For a tsvector, it creates an index entry for each lexeme (word). When you search for 'postgres' & 'hnsw', it can very quickly find the intersection of documents containing both lexemes. This makes it ideal for FTS.
  • GiST (Generalized Search Tree): Is a height-balanced tree structure that's more suited for indexing continuous data or finding the nearest items. While it can be used for FTS, it's generally slower than GIN for the typical conjunction/disjunction queries used in search.
  • A sample FTS query returning a relevance score might look like this:

    sql
    -- The query string to search for
    -- plainto_tsquery is a good choice for user-provided search strings
    -- as it parses the text and creates a tsquery with AND operators.
    WITH search_query AS (
        SELECT plainto_tsquery('english', 'postgres vector performance') AS query
    )
    SELECT 
        d.id,
        d.title,
        -- ts_rank_cd calculates a relevance score (BM25-like) based on word frequency and proximity.
        ts_rank_cd(d.content_tsv, s.query) AS fts_score
    FROM 
        documents d, search_query s
    WHERE 
        d.content_tsv @@ s.query
    ORDER BY 
        fts_score DESC
    LIMIT 10;

    Pillar B: Mastering `pgvector` Indexing

    This is where many implementations fall short. A sequential scan on a vector column is a performance disaster. Choosing the right index type is paramount.

    pgvector offers two main ANN (Approximate Nearest Neighbor) index types: IVFFlat and HNSW.

    HNSW (Hierarchical Navigable Small World)

    This is the modern default for most vector search use cases. It builds a multi-layered graph structure that allows for extremely fast and accurate nearest-neighbor searches.

  • m: The maximum number of connections per node in the graph. Higher m leads to better recall but a larger, slower-to-build index. Default is 16.
  • ef_construction: The size of the dynamic candidate list during index construction. Higher values lead to a more accurate index, but construction takes longer. Default is 64.
  • sql
    -- Create an HNSW index on the embedding column
    -- This can take a significant amount of time on large tables
    CREATE INDEX idx_documents_embedding_hnsw ON documents USING HNSW (embedding vector_l2_ops);
    
    -- For cosine distance, use: vector_cosine_ops
    -- For inner product, use: vector_ip_ops

    IVFFlat (Inverted File with Flat Compression)

    This index works by partitioning the vector space into lists (clusters). At query time, it searches only in the probes number of clusters closest to the query vector.

    • It is significantly faster to build than HNSW.
  • Its performance is highly dependent on tuning lists and probes.
    • It can suffer from lower recall if the optimal clusters are not probed.
    sql
    -- Example of an IVFFlat index
    -- The number of lists should be sqrt(N) for up to 1M rows, and N/1000 for larger datasets.
    -- Assuming 100,000 rows, sqrt(100000) is ~316.
    CREATE INDEX idx_documents_embedding_ivfflat ON documents USING IVFFlat (embedding vector_l2_ops) 
    WITH (lists = 316);

    Recommendation: Use HNSW

    For new projects with dynamic data, HNSW is almost always the superior choice. Its build time is a one-time cost (per-row), but its query performance and high recall without manual probe-tuning are huge operational advantages. We will proceed using the HNSW index.

    Tuning HNSW at Query Time

    Query performance vs. accuracy can be tuned on-the-fly using the ef_search parameter. This sets the size of the candidate list during a search.

    sql
    -- Set a higher ef_search for better recall, at the cost of latency.
    -- This MUST be set before the query that uses the index.
    SET hnsw.ef_search = 100;
    
    SELECT 
        id,
        title,
        embedding <-> '[...your_query_vector...]' AS distance
    FROM 
        documents
    ORDER BY 
        distance ASC
    LIMIT 10;
    
    -- Reset to default if needed
    RESET hnsw.ef_search;

    Part 3: The Fusion - Reciprocal Rank Fusion (RRF)

    The most challenging part of hybrid search is merging the two result sets. A naive approach is to normalize the FTS score and vector distance to a 0-1 range and use a weighted sum. This is a fragile and problematic pattern.

  • Different Scales: FTS scores and vector distances are not on comparable scales. ts_rank_cd is non-linear, while L2 distance is.
  • Corpus Dependency: The distribution of FTS scores changes as your dataset grows, making any hand-tuned normalization formula obsolete over time.
  • Tuning Nightmare: The alpha weight (e.g., alpha norm_fts + (1-alpha) norm_vector) becomes a magic number that is impossible to set correctly for all possible queries.
  • Enter Reciprocal Rank Fusion (RRF)

    RRF provides an elegant, parameter-free solution. It disregards the magnitude of the scores and focuses only on the rank of each document in the result lists.

    The formula is simple: for each document, its RRF score is the sum of the reciprocal of its rank in each result list.

    RRF_Score = Σ (1 / (k + rank_i))

  • rank_i: The rank of the document in result list i.
  • k: A constant used to mitigate the impact of high ranks (i.e., results far down the list). A value of k=60 is standard in the literature and works well in practice.
  • Implementing RRF in SQL

    We can implement RRF using Common Table Expressions (CTEs) to fetch and rank each result set, followed by a FULL OUTER JOIN to merge them.

    Here is the complete, production-ready query. Assume we have a :query_text placeholder for the user's search string and a :query_embedding placeholder for its corresponding vector.

    sql
    -- Final Hybrid Search Query with RRF
    -- :query_text = 'tuning vector search in postgres'
    -- :query_embedding = '[...vector for the text above...]' (as a string)
    
    WITH fts_results AS (
        -- First, get the results from Full-Text Search
        SELECT 
            id,
            -- Rank the results based on their FTS score
            ROW_NUMBER() OVER (ORDER BY ts_rank_cd(content_tsv, plainto_tsquery('english', :query_text)) DESC) as fts_rank
        FROM documents
        WHERE content_tsv @@ plainto_tsquery('english', :query_text)
        LIMIT 200 -- Limit the number of results from each search to keep it performant
    ), 
    vector_results AS (
        -- Second, get the results from Vector Search
        SELECT 
            id,
            -- Rank the results based on their distance
            ROW_NUMBER() OVER (ORDER BY embedding <-> :query_embedding::vector) as vector_rank
        FROM documents
        ORDER BY embedding <-> :query_embedding::vector
        LIMIT 200
    ), 
    rrf_fusion AS (
        -- Fuse the results using Reciprocal Rank Fusion (RRF)
        SELECT
            -- Use COALESCE to handle documents that appear in only one result set
            COALESCE(fts.id, vec.id) as id,
            -- Calculate the RRF score. If a doc is not in a result set, its rank is effectively infinity,
            -- so its contribution to the score is 0.
            (1.0 / (60 + fts.fts_rank)) + (1.0 / (60 + vec.vector_rank)) as rrf_score
        FROM fts_results fts
        FULL OUTER JOIN vector_results vec ON fts.id = vec.id
    )
    SELECT 
        d.id,
        d.title,
        rf.rrf_score
    FROM rrf_fusion rf
    JOIN documents d ON rf.id = d.id
    ORDER BY rf.rrf_score DESC
    LIMIT 20;

    Analysis of the RRF Query:

  • CTEs fts_results and vector_results: We run each query independently. This is crucial as it allows PostgreSQL's query planner to use the respective GIN and HNSW indexes for each subquery.
  • LIMIT 200: This is a critical performance optimization. We don't need to rank the entire table. By fetching the top N (e.g., 200) results from each search, we provide more than enough candidates for a high-quality fusion for the final top 10-20 results shown to the user.
  • ROW_NUMBER() OVER(...): This window function assigns the rank to each document, which is the key input for RRF.
  • FULL OUTER JOIN: This is the correct join type. An INNER JOIN would only return documents present in both sets. A LEFT JOIN would miss documents found only by the vector search. FULL OUTER JOIN ensures we consider every unique document from both searches.
  • RRF Score Calculation: The COALESCE in the score calculation is implicit. If fts.fts_rank is NULL (because the document was only in vector_results), then 1.0 / (60 + NULL) evaluates to NULL. We need to handle this properly.
  • A More Robust RRF Score Calculation:

    The previous calculation has a slight flaw with NULLs. A better way is to use COALESCE on the rank itself.

    sql
    -- ... (CTEs are the same)
    
    -- Improved Fusion Logic
    SELECT
        COALESCE(fts.id, vec.id) as id,
        -- Use COALESCE to treat a missing rank as 0 contribution to the score
        COALESCE(1.0 / (60 + fts.fts_rank), 0.0) + 
        COALESCE(1.0 / (60 + vec.vector_rank), 0.0) as rrf_score
    FROM fts_results fts
    FULL OUTER JOIN vector_results vec ON fts.id = vec.id
    
    -- ... (Final join and order by are the same)

    This version is more explicit and robust, ensuring that if a document is missing from one list, its score contribution from that list is correctly treated as zero.

    Part 4: Production-Grade Implementation and Edge Cases

    A complex query like this should not live in your application code. It's hard to maintain, prone to SQL injection, and couples your application to the database schema. We'll encapsulate it in a PL/pgSQL function.

    Encapsulation with a PL/pgSQL Function

    This function will provide a clean API for our application. It takes the search text and vector, and returns a table of results.

    sql
    CREATE TYPE search_result AS (id bigint, title text, rrf_score float);
    
    CREATE OR REPLACE FUNCTION hybrid_search(
        query_text text,
        query_embedding vector(384),
        match_limit int DEFAULT 20,
        candidate_limit int DEFAULT 200
    ) RETURNS SETOF search_result AS $$
    BEGIN
        -- Set a reasonable ef_search for this transaction
        -- This allows per-query tuning without affecting the global setting
        SET LOCAL hnsw.ef_search = 100;
    
        RETURN QUERY
        WITH fts_results AS (
            SELECT 
                d.id,
                ROW_NUMBER() OVER (ORDER BY ts_rank_cd(d.content_tsv, plainto_tsquery('english', query_text)) DESC) as fts_rank
            FROM documents d
            WHERE d.content_tsv @@ plainto_tsquery('english', query_text)
            LIMIT candidate_limit
        ), 
        vector_results AS (
            SELECT 
                d.id,
                ROW_NUMBER() OVER (ORDER BY d.embedding <-> query_embedding) as vector_rank
            FROM documents d
            ORDER BY d.embedding <-> query_embedding
            LIMIT candidate_limit
        ), 
        rrf_fusion AS (
            SELECT
                COALESCE(fts.id, vec.id) as id,
                COALESCE(1.0 / (60 + fts.fts_rank), 0.0) + 
                COALESCE(1.0 / (60 + vec.vector_rank), 0.0) as rrf_score
            FROM fts_results fts
            FULL OUTER JOIN vector_results vec ON fts.id = vec.id
        )
        SELECT 
            d.id,
            d.title,
            rf.rrf_score
        FROM rrf_fusion rf
        JOIN documents d ON rf.id = d.id
        ORDER BY rf.rrf_score DESC
        LIMIT match_limit;
    END;
    $$ LANGUAGE plpgsql;

    Now, the application code becomes beautifully simple and secure:

    python
    # In your Python application
    search_term = "tuning vector search in postgres"
    search_vector = get_embedding(search_term)
    
    with conn.cursor() as cur:
        results = cur.execute(
            "SELECT * FROM hybrid_search(%s, %s, match_limit := 10)",
            (search_term, search_vector)
        ).fetchall()
    
    for row in results:
        print(row)

    Performance Analysis with `EXPLAIN ANALYZE`

    Let's analyze the query plan for our function. You would run:

    EXPLAIN ANALYZE SELECT * FROM hybrid_search('postgres vector', '[...]', 20, 200);

    The output will be complex, but you must verify these key points:

  • In the fts_results CTE: The planner must use the idx_documents_tsv GIN index. You should see a Bitmap Heap Scan on documents using a Bitmap Index Scan on the GIN index.
  • In the vector_results CTE: The planner must use the idx_documents_embedding_hnsw HNSW index. You should see an Index Scan on the HNSW index. It should not be a Sequential Scan.
  • The Join: The FULL OUTER JOIN will likely be a Hash Full Join. Its cost should be reasonable since it's only joining on candidate_limit * 2 rows at most.
  • Final Sort: The final Sort operation on the rrf_score will operate on the same small number of rows. Its cost should be minimal.
  • If either of the index scans is missing, your query will be unacceptably slow on a large dataset. The most common culprit for a missed vector index scan is a data type mismatch (e.g., passing a text representation of a vector that isn't properly cast to vector).

    Edge Case: Re-indexing Strategy

  • FTS: The STORED generated column handles this automatically. When a document's title or content is updated, the content_tsv is recomputed and the GIN index is updated transactionally. This is a major advantage of the integrated approach.
  • Vectors: This is the primary operational burden. When a document is updated, you must trigger a job in your application to re-generate the embedding and UPDATE the embedding column for that row. More critically, if you decide to upgrade your embedding model (e.g., from all-MiniLM-L6-v2 to a new, better model), you must perform a full backfill: iterate over every row in the documents table, generate the new embedding, and update the row. This can be a long-running, resource-intensive migration that requires careful planning to avoid downtime.
  • Conclusion: PostgreSQL as a Unified Search Platform

    By leveraging the native power of PostgreSQL's Full-Text Search and the extensibility of pgvector, we have constructed a state-of-the-art hybrid search system without introducing external dependencies like Elasticsearch or a dedicated vector database. The key was moving beyond naive score normalization and implementing the robust, rank-based Reciprocal Rank Fusion algorithm.

    This pattern provides significant architectural benefits:

  • Simplicity: Your data and its primary indexes live together. There are no complex data synchronization pipelines between your primary database and a search engine.
  • Transactional Integrity: Search indexes are updated transactionally with your data, eliminating eventual consistency issues.
  • Performance: With proper indexing (GIN for FTS, HNSW for vectors), performance is competitive with specialized solutions for many common use cases.
  • This approach transforms PostgreSQL from a mere system of record into a powerful, unified platform for data storage and intelligent retrieval, simplifying the stack and empowering developers to build sophisticated search experiences with tools they already know and trust.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles