Hybrid Search in Postgres: Combining pgvector & pg_bm25 with RRF
The Dichotomy of Modern Search: Semantic vs. Keyword
In modern search systems, engineers face a fundamental trade-off. On one hand, dense vector search, powered by embeddings from models like BERT or sentence-transformers, delivers remarkable semantic understanding. It can find documents that are conceptually similar, even if they don't share any keywords. A search for "summer clothes" might correctly surface "sun dresses" and "linen shorts." However, this approach can falter on precision, especially with rare keywords, SKUs, or specific identifiers. The vector representation might average out the importance of a critical term like RTX-4090-OC, leading to less relevant results.
On the other hand, traditional lexical search, perfected by algorithms like BM25, excels at keyword matching. It's fast, precise, and gives users exactly what they ask for when they know the specific terms. But it's brittle. It has no conceptual grasp of the world; "summer clothes" and "warm weather apparel" are entirely different queries to a BM25-powered system unless extensive synonym lists are manually maintained.
The state-of-the-art solution is not to choose one, but to combine both. Hybrid search leverages the strengths of each paradigm, delivering results that are both semantically relevant and keyword-aware. Historically, this required complex architectures involving a primary database (like PostgreSQL), a dedicated search engine (like Elasticsearch), and a vector database (like Pinecone or Weaviate), leading to significant data synchronization challenges and operational overhead.
This article details a powerful, modern alternative: implementing a complete, production-grade hybrid search system entirely within PostgreSQL. By leveraging the pgvector extension for dense vector search and the pg_bm25 extension for advanced sparse vector (keyword) search, we can build a unified, high-performance system. The crux of the challenge, and the focus of this article, is how to intelligently fuse the results from these two disparate systems. We will implement Reciprocal Rank Fusion (RRF), a simple yet profoundly effective technique for combining ranked lists without needing to normalize incomparable scores.
Prerequisites
This guide assumes you are a senior engineer with a strong understanding of:
- PostgreSQL administration and advanced SQL (CTEs, window functions).
- The basic concepts of vector embeddings and cosine similarity.
tsvector in Postgres).We will not cover the basics of what a vector is or how to install PostgreSQL. We dive directly into the architecture and implementation.
System Architecture & Schema Setup
Our goal is to create a single table that can efficiently serve both dense and sparse search queries. Let's model a product catalog for an e-commerce site. Each product has a title, description, and other metadata.
First, ensure the necessary extensions are enabled in your database:
-- Enable vector support for embeddings
CREATE EXTENSION IF NOT EXISTS vector;
-- Enable Okapi BM25 ranking functions for FTS
CREATE EXTENSION IF NOT EXISTS pg_bm25;Now, let's define our products table. The design is critical: it must contain columns for primary data, a vector for the dense embedding, and a tsvector for the sparse representation.
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    sku VARCHAR(20) UNIQUE NOT NULL,
    title TEXT NOT NULL,
    description TEXT,
    category VARCHAR(100),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    -- Column for dense vector embeddings from a model
    -- The dimension (e.g., 384) depends on the embedding model used.
    embedding VECTOR(384),
    -- Column for sparse vector representation for keyword search
    -- This is the standard Postgres full-text search type.
    fts_document TSVECTOR
);Data Ingestion Strategy
Populating the embedding and fts_document columns is a critical ETL (Extract, Transform, Load) step. While a real-time system might use message queues and workers, we'll focus on the core logic. A common pattern is to use a trigger to automatically update the fts_document whenever the text content changes.
-- Create a function to update the fts_document
CREATE OR REPLACE FUNCTION update_fts_document() RETURNS TRIGGER AS $$
BEGIN
    -- Concatenate relevant text fields and convert to a tsvector
    -- 'A', 'B', 'C', 'D' are weights for title, description, sku, category respectively
    NEW.fts_document := 
        setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.description, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(NEW.sku, '')), 'C') ||
        setweight(to_tsvector('english', coalesce(NEW.category, '')), 'D');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Create a trigger to call the function on insert or update
CREATE TRIGGER products_fts_update_trigger
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION update_fts_document();The embedding column must be populated externally, as it requires a machine learning model. Here's a conceptual Python script using the sentence-transformers library to generate and insert the data.
import psycopg2
from sentence_transformers import SentenceTransformer
# --- Configuration ---
DB_CONN_STRING = "postgresql://user:password@host:port/dbname"
MODEL_NAME = 'all-MiniLM-L6-v2' # 384 dimensions
# --- Sample Data ---
products_data = [
    {"sku": "SNK-001", "title": "Men's Running Shoes", "description": "Lightweight sneakers for daily runs and marathon training.", "category": "Footwear"},
    {"sku": "TSH-004", "title": "Organic Cotton T-Shirt", "description": "A soft, breathable crewneck shirt made from 100% organic cotton.", "category": "Apparel"},
    {"sku": "JNS-002", "title": "Slim-Fit Denim Jeans", "description": "Classic five-pocket styling in a modern slim fit. Stretch denim for comfort.", "category": "Apparel"},
    {"sku": "WTR-010", "title": "Insulated Water Bottle", "description": "Stainless steel bottle that keeps drinks cold for 24 hours or hot for 12.", "category": "Accessories"}
]
# --- Main Logic ---
def main():
    print(f"Loading sentence transformer model: {MODEL_NAME}")
    model = SentenceTransformer(MODEL_NAME)
    conn = psycopg2.connect(DB_CONN_STRING)
    cur = conn.cursor()
    print("Generating embeddings and inserting data...")
    for product in products_data:
        # The trigger will handle fts_document automatically
        # We only need to generate the dense embedding
        text_to_embed = f"{product['title']} {product['description']}"
        embedding = model.encode(text_to_embed).tolist()
        cur.execute(
            """INSERT INTO products (sku, title, description, category, embedding) 
               VALUES (%s, %s, %s, %s, %s)""",
            (product['sku'], product['title'], product['description'], product['category'], embedding)
        )
    
    conn.commit()
    cur.close()
    conn.close()
    print("Data insertion complete.")
if __name__ == "__main__":
    main()This setup provides a solid foundation. Every row now contains the necessary data structures for both search paradigms.
Indexing Strategies for Production Performance
Unindexed queries on our embedding and fts_document columns would result in sequential scans, which are unacceptably slow for any real-world dataset. Correct indexing is non-negotiable.
`pg_bm25` Indexing (Sparse)
This is the more straightforward of the two. pg_bm25 leverages PostgreSQL's built-in Full-Text Search capabilities, which are best served by a GIN (Generalized Inverted Index).
-- Create a GIN index on the tsvector column for fast keyword search
CREATE INDEX idx_products_fts ON products USING GIN(fts_document);A GIN index creates an entry for each unique lexeme (word) and points to all the documents containing it. This makes finding documents that match a set of keywords extremely fast.
`pgvector` Indexing (Dense)
Indexing for Approximate Nearest Neighbor (ANN) search is more complex. pgvector supports two main index types: IVFFlat and HNSW. Choosing between them involves a trade-off between build time, index size, and query performance.
1. IVFFlat (Inverted File with Flat Compression)
lists. At query time, it searches only in the most promising probes (a subset of lists) closest to your query vector, rather than the entire dataset.lists and probes parameters requires tuning.-- Create an IVFFlat index
-- The number of lists should be sqrt(N) for up to 1M rows, and N/1000 for larger datasets.
-- Assuming we have 100,000 products:
CREATE INDEX idx_products_embedding_ivfflat 
ON products 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 316);At query time, you must set the probes parameter. A higher value increases accuracy (recall) at the cost of latency.
-- Set before running your query
SET ivfflat.probes = 10;
SELECT id, title FROM products ORDER BY embedding <=> '[...]' LIMIT 10;2. HNSW (Hierarchical Navigable Small World)
-- Create an HNSW index
-- m: max number of connections per layer (default 16)
-- ef_construction: size of the dynamic list for neighbors during build (default 64)
CREATE INDEX idx_products_embedding_hnsw
ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);Similar to probes in IVFFlat, HNSW has a query-time parameter ef_search that controls the trade-off between speed and accuracy.
-- Set before running your query
SET hnsw.ef_search = 40;
SELECT id, title FROM products ORDER BY embedding <=> '[...]' LIMIT 10;Production Recommendation: For most modern, latency-sensitive applications, HNSW is the preferred choice. While the build time is higher, the superior query performance is usually worth the trade-off. Start with the default m and ef_construction values and tune ef_search at query time to meet your application's specific latency and recall requirements.
The Fusion Problem: Combining Two Worlds
With our data ingested and indexed, we can now execute two separate, efficient queries.
Dense Vector Search Query:
-- Assume query_embedding is the 384-dimensional vector for "warm weather apparel"
SELECT id, title, 1 - (embedding <=> query_embedding) AS similarity_score
FROM products
ORDER BY embedding <=> query_embedding
LIMIT 20;Note: The <=> operator calculates cosine distance. We use 1 - distance to get cosine similarity. 
Sparse BM25 Search Query:
-- Assume query_tsquery is to_tsquery('english', 'warm & weather & apparel')
SELECT p.id, p.title, bm25.bm25(p.fts_document, query_tsquery) AS relevance_score
FROM products p, 
     to_tsquery('english', 'warm & weather & apparel') query_tsquery,
     (SELECT avg(pg_column_size(fts_document)) FROM products) AS avg_doc_length,
     (SELECT count(*) FROM products) AS doc_count
WHERE p.fts_document @@ query_tsquery
ORDER BY relevance_score DESC
LIMIT 20;Note: The pg_bm25 extension does not yet have a built-in bm25 ranking function, this is a placeholder for the logic you'd implement or a future version. A common workaround is to use ts_rank or manually calculate BM25 in a PL/pgSQL function. For simplicity in this article, we'll assume a bm25_rank function exists that encapsulates this logic.
We now have two ranked lists of product IDs. The core problem is that their scores are on different, incomparable scales. A similarity_score of 0.85 from pgvector has no direct relationship to a relevance_score of 12.3 from BM25. A simple weighted sum (w1  score1) + (w2  score2) is extremely difficult to tune and maintain because the score distributions can change with the data.
This is where Reciprocal Rank Fusion (RRF) provides an elegant solution.
The Solution: Reciprocal Rank Fusion (RRF)
RRF is a rank-based fusion method, meaning it completely ignores the original scores and only considers the position of each document in the ranked lists. This makes it robust and eliminates the need for score normalization.
The formula is simple: for each document d, its RRF score is the sum of the reciprocal of its rank in each list.
RRF_Score(d) = Σ (1 / (k + rank_i(d)))
rank_i(d) is the rank of document d in result set i.k is a constant that mitigates the effect of high ranks (e.g., positions 1 and 2) having an outsized impact. A common value for k is 60.We can implement this directly in SQL using Common Table Expressions (CTEs).
The Master Hybrid Search Query
Here is a complete, production-ready query that performs dense search, sparse search, and fuses the results using RRF.
-- Hybrid Search Query with RRF
-- :query_text = 'warm weather apparel'
-- :query_embedding = [0.1, 0.2, ...]
-- :k = 60 (RRF constant)
WITH dense_results AS (
    -- 1. Perform dense vector search
    SELECT 
        id,
        -- Use row_number() to get the rank, starting from 1
        row_number() OVER (ORDER BY embedding <=> :query_embedding) AS rank
    FROM products
    -- Add a WHERE clause for pre-filtering if needed, e.g., WHERE category = 'Apparel'
    ORDER BY embedding <=> :query_embedding
    LIMIT 100 -- Fetch more results than needed to allow for fusion
), 
sparse_results AS (
    -- 2. Perform sparse BM25 search
    SELECT 
        id,
        row_number() OVER (ORDER BY ts_rank_cd(fts_document, to_tsquery('english', :query_text)) DESC) AS rank
    FROM products
    WHERE fts_document @@ to_tsquery('english', :query_text)
    ORDER BY ts_rank_cd(fts_document, to_tsquery('english', :query_text)) DESC
    LIMIT 100
), 
-- We use ts_rank_cd here as a stand-in for BM25 ranking for simplicity.
-- A custom pg_bm25 function would be used in a real implementation.
-- 3. Fuse the results using Reciprocal Rank Fusion (RRF)
fused_results AS (
    SELECT id, 1.0 / (:k + rank) as rrf_score FROM dense_results
    UNION ALL
    SELECT id, 1.0 / (:k + rank) as rrf_score FROM sparse_results
)
-- 4. Aggregate the scores and rank the final results
SELECT 
    p.id,
    p.title,
    p.sku,
    SUM(fr.rrf_score) AS final_score
FROM fused_results fr
JOIN products p ON fr.id = p.id
GROUP BY p.id, p.title, p.sku
ORDER BY final_score DESC
LIMIT 20;Breakdown of the Query:
dense_results CTE: Executes the vector search. We use the row_number() window function to assign a rank to each result based on its cosine distance. We fetch a larger limit (e.g., 100) to provide a rich set of candidates for fusion.sparse_results CTE: Executes the keyword search using ts_rank_cd for ranking and assigns a rank similarly.fused_results CTE: This is the heart of RRF. We UNION ALL the two result sets. For each entry, we calculate its RRF contribution 1 / (k + rank). A document appearing in both lists will have two rows in this CTE.GROUP BY the product ID, SUM the rrf_score for each document, and order by this final aggregated score to get our hybrid-ranked list.This single SQL query is declarative, optimizable by the PostgreSQL query planner, and contains the entire hybrid search logic, eliminating the need for application-layer merging.
Advanced Considerations and Edge Cases
While the above query is powerful, a production system requires attention to detail.
Performance Tuning with `EXPLAIN ANALYZE`
Always run EXPLAIN ANALYZE on your final query. You should see:
dense_results CTE.sparse_results CTE.If you see a Sequential Scan, your indexes are not being used, and performance will be poor. This could be due to a type mismatch, an incorrect operator, or stale table statistics (ANALYZE products;).
Weighting Dense vs. Sparse Results
RRF treats both result sets equally. What if you want to prioritize keyword matches? You can introduce a weighting factor into the fusion step.
-- Modified fused_results CTE for weighted RRF
-- :dense_weight = 0.5
-- :sparse_weight = 1.5
fused_results AS (
    SELECT id, :dense_weight * (1.0 / (:k + rank)) as rrf_score FROM dense_results
    UNION ALL
    SELECT id, :sparse_weight * (1.0 / (:k + rank)) as rrf_score FROM sparse_results
)By multiplying the RRF contribution, you can bias the final ranking. A sparse_weight of 1.5 would give more influence to documents that rank highly in the keyword search.
Handling No Results
The UNION ALL approach is robust. If one of the CTEs returns zero rows (e.g., no keyword match), the fusion logic still works correctly—it will simply consist of the results from the other search.
Pre-filtering vs. Post-filtering
Consider a query like "running shoes for men". You can implement the "for men" filter in two ways:
WHERE clause inside the dense_results and sparse_results CTEs (e.g., WHERE gender = 'male'). This reduces the search space before the expensive ranking and fusion steps.WHERE clause after the final aggregation. This is less efficient as you perform the search on the entire dataset first.Always use pre-filtering whenever possible. This is particularly effective if you have an index on the filter column (e.g., category, gender).
Full, Runnable Implementation Example
To make this concrete, here is a docker-compose.yml to spin up a PostgreSQL instance with the necessary extensions, using the pgvector/pgvector image.
docker-compose.yml
version: '3.8'
services:
  postgres:
    image: pgvector/pgvector:pg16
    container_name: postgres_hybrid_search
    environment:
      - POSTGRES_USER=testuser
      - POSTG-RES_PASSWORD=testpass
      - POSTGRES_DB=testdb
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
volumes:
  postgres_data:Note: The pgvector/pgvector image includes pgvector. You would need to add pg_bm25 manually or find an image that includes both. For this example, we will use ts_rank_cd as a proxy for BM25, which is available by default.
A Python script can then orchestrate the entire process: setup, data loading, indexing, and querying.
# [Include the Python script from Section 2 for setup]
def hybrid_search(query_text: str, k: int = 60, dense_weight: float = 1.0, sparse_weight: float = 1.0, limit: int = 20):
    model = SentenceTransformer('all-MiniLM-L6-v2')
    query_embedding = model.encode(query_text).tolist()
    # Prepare the query text for to_tsquery by replacing spaces with '&'
    tsquery_text = ' & '.join(query_text.strip().split())
    conn = psycopg2.connect(DB_CONN_STRING)
    cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
    sql_query = """
    WITH dense_results AS (
        SELECT id, row_number() OVER (ORDER BY embedding <=> %(embedding)s) AS rank
        FROM products
        LIMIT 100
    ), 
    sparse_results AS (
        SELECT id, row_number() OVER (ORDER BY ts_rank_cd(fts_document, to_tsquery('english', %(tsquery)s)) DESC) AS rank
        FROM products
        WHERE fts_document @@ to_tsquery('english', %(tsquery)s)
        LIMIT 100
    ),
    fused_results AS (
        SELECT id, %(dense_weight)s * (1.0 / (%(k)s + rank)) as rrf_score FROM dense_results
        UNION ALL
        SELECT id, %(sparse_weight)s * (1.0 / (%(k)s + rank)) as rrf_score FROM sparse_results
    )
    SELECT 
        p.id,
        p.title,
        p.sku,
        SUM(fr.rrf_score) AS final_score
    FROM fused_results fr
    JOIN products p ON fr.id = p.id
    GROUP BY p.id, p.title, p.sku
    ORDER BY final_score DESC
    LIMIT %(limit)s;
    """
    params = {
        'embedding': str(query_embedding),
        'tsquery': tsquery_text,
        'k': k,
        'dense_weight': dense_weight,
        'sparse_weight': sparse_weight,
        'limit': limit
    }
    cur.execute(sql_query, params)
    results = cur.fetchall()
    cur.close()
    conn.close()
    return results
# --- Example Usage ---
if __name__ == "__main__":
    # Run the setup and data insertion first...
    # main()
    search_query = "soft cotton shirt"
    print(f"Executing hybrid search for: '{search_query}'")
    search_results = hybrid_search(search_query)
    for row in search_results:
        print(f"ID: {row['id']}, SKU: {row['sku']}, Title: {row['title']}, Score: {row['final_score']:.4f}")This runnable example provides a complete, end-to-end blueprint for implementing this advanced search pattern.
Conclusion
By moving beyond the false choice between semantic and keyword search, you can build a vastly superior user experience. This guide has demonstrated that a sophisticated hybrid search system is not only possible but practical to implement directly within PostgreSQL, a database many teams already use and understand. This approach consolidates your architecture, simplifies data management, and leverages the power of mature, well-supported extensions.
The key takeaways for production implementation are:
vector and tsvector columns to represent the dense and sparse aspects of your documents.pgvector, and GIN is the proven choice for tsvector columns. Proper indexing is the most critical factor for performance.EXPLAIN ANALYZE to validate your index usage and tune query-time parameters like hnsw.ef_search and RRF weights to meet your specific relevance and latency goals.By adopting this pattern, you can unlock a new level of search relevance without the operational burden of managing a complex, distributed search stack.