Hybrid Search in Postgres: Combining pgvector HNSW with BM25 FTS

September 27, 2025

21 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Relevance Problem: Semantic vs. Lexical Search

In modern search applications, the chasm between user intent and query results is a constant battle. We have two powerful, yet fundamentally flawed, paradigms at our disposal:

Semantic Search (Vector Search): Powered by dense vector embeddings from models like BERT or Sentence-Transformers, this approach is exceptional at understanding conceptual similarity. A search for "summer clothing" might correctly return "sun dresses" and "linen shorts," even if the exact keywords are missing. However, it often fails on specifics. A query for a product SKU like XG-500-PRO might get lost in the semantic noise, returning conceptually similar but incorrect items.

Lexical Search (Full-Text Search): This is the traditional workhorse, exemplified by algorithms like TF-IDF and its modern successor, BM25 (Best Matching 25). It excels at matching specific keywords, phrases, and identifiers. It will find XG-500-PRO with perfect precision. Its weakness is the inverse of semantic search; it has no understanding that "summer clothing" and "sun dresses" are related concepts.

For senior engineers, the goal is clear: we need a system that leverages the strengths of both. This is the domain of hybrid search. While many solutions involve complex, multi-system architectures (e.g., PostgreSQL for metadata, Elasticsearch for FTS, and a dedicated vector database like Pinecone or Weaviate), this post demonstrates a powerful, single-database solution using modern PostgreSQL capabilities.

We will implement a hybrid search system that combines pgvector's high-performance HNSW index for semantic search with a robust, custom-built BM25 ranking function for lexical search. The final, crucial step will be to fuse these two result sets using Reciprocal Rank Fusion (RRF) to produce a unified, superior ranking.

This is not a beginner's guide. We assume you understand what vector embeddings are, the basics of SQL, and the fundamental concepts of full-text search.

Our Technical Stack

* Database: PostgreSQL 15+ (for MERGE command and other improvements)

* Vector Extension: pgvector 0.5.0+ (for HNSW index support)

* Language for Data Prep: Python 3.9+ with psycopg2, sentence-transformers, numpy.

1. Schema Design and Data Population

Our foundation is a well-designed table that accommodates both lexical and semantic data. Let's model a product catalog for an e-commerce site.

sql

-- Ensure the pgvector extension is installed
CREATE EXTENSION IF NOT EXISTS vector;

-- A table to store global statistics for our custom BM25 implementation
CREATE TABLE IF NOT EXISTS product_search_stats (
    stat_key TEXT PRIMARY KEY,
    stat_value NUMERIC NOT NULL
);

-- Our main products table
CREATE TABLE IF NOT EXISTS products (
    id SERIAL PRIMARY KEY,
    sku TEXT UNIQUE NOT NULL,
    name TEXT NOT NULL,
    description TEXT,
    -- For pgvector semantic search
    embedding VECTOR(384) NOT NULL, 
    -- For FTS lexical search
    fts_document TSVECTOR
);

-- Create a trigger to update the tsvector automatically
CREATE OR REPLACE FUNCTION update_fts_document() RETURNS TRIGGER AS $$
BEGIN
    NEW.fts_document := 
        setweight(to_tsvector('english', coalesce(NEW.name, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.description, '')), 'B');
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER products_fts_update_trigger
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION update_fts_document();

Key Design Decisions:

embedding VECTOR(384): We're using the all-MiniLM-L6-v2 model from sentence-transformers, which produces 384-dimension vectors. The dimension must* match your embedding model.

* fts_document TSVECTOR: Instead of creating a tsvector on the fly during queries, we pre-materialize it in a dedicated column. This is a critical performance optimization.

* Trigger-based tsvector Update: A trigger ensures the fts_document is always synchronized with the name and description fields. We use setweight to give matches in the product name (Weight 'A') higher importance than matches in the description (Weight 'B').

* product_search_stats: This small table will hold pre-calculated statistics like the average document length, which is required for our BM25 implementation. This avoids costly table scans during search queries.

Populating with Embeddings

Next, we'll write a Python script to populate the products table. This involves generating embeddings for our product data.

python

# populate_products.py
import psycopg2
import numpy as np
from sentence_transformers import SentenceTransformer

def get_db_connection():
    return psycopg2.connect(
        dbname="your_db", 
        user="your_user", 
        password="your_password", 
        host="localhost"
    )

def main():
    products_data = [
        {'sku': 'TS-001', 'name': 'Quantum Weave T-Shirt', 'description': 'A comfortable cotton t-shirt with a futuristic design.'},
        {'sku': 'JN-001', 'name': 'Starlight Denim Jeans', 'description': 'Durable and stylish jeans, perfect for any occasion.'},
        {'sku': 'DR-001', 'name': 'Summer Breeze Sundress', 'description': 'A light and airy dress for warm sunny days.'},
        {'sku': 'SH-001', 'name': 'Linen Holiday Shorts', 'description': 'Breathable shorts ideal for a summer vacation.'},
        {'sku': 'XG-500', 'name': 'Pro-Grade Graphics Card', 'description': 'High-performance GPU for gaming and professional workloads.'},
        {'sku': 'XG-500-PRO', 'name': 'Pro-Grade Graphics Card - Pro Edition', 'description': 'The ultimate GPU for 4K gaming and demanding creative tasks.'},
    ]

    print("Loading sentence transformer model...")
    model = SentenceTransformer('all-MiniLM-L6-v2')

    print("Generating embeddings...")
    texts_to_embed = [f"{p['name']}: {p['description']}" for p in products_data]
    embeddings = model.encode(texts_to_embed, show_progress_bar=True)

    conn = get_db_connection()
    try:
        with conn.cursor() as cur:
            print("Inserting data into database...")
            for i, p in enumerate(products_data):
                embedding_list = embeddings[i].tolist()
                cur.execute(
                    """INSERT INTO products (sku, name, description, embedding) 
                       VALUES (%s, %s, %s, %s) 
                       ON CONFLICT (sku) DO NOTHING""",
                    (p['sku'], p['name'], p['description'], embedding_list)
                )
            
            print("Updating search statistics...")
            cur.execute("""
                INSERT INTO product_search_stats (stat_key, stat_value)
                SELECT 'total_docs', COUNT(*) FROM products
                ON CONFLICT (stat_key) DO UPDATE SET stat_value = EXCLUDED.stat_value;
            """)
            cur.execute("""
                INSERT INTO product_search_stats (stat_key, stat_value)
                SELECT 'avg_doc_length', AVG(pg_column_size(fts_document)) FROM products
                ON CONFLICT (stat_key) DO UPDATE SET stat_value = EXCLUDED.stat_value;
            """)

        conn.commit()
        print("Data population complete.")
    finally:
        conn.close()

if __name__ == '__main__':
    main()

After running this script, our products table will be populated, and the fts_document for each row will be automatically generated by our trigger.

2. Component 1: Semantic Search with `pgvector` HNSW

The pgvector extension provides several index types. For production workloads where query latency is critical, HNSW (Hierarchical Navigable Small World) is the superior choice over IVFFlat. While HNSW has a longer build time and higher memory usage, its query performance is significantly better, especially on large datasets.

sql

-- Create the HNSW index on our embedding column
-- The cosine operator is recommended for sentence-transformer models
CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops);

Tuning HNSW Index Creation:

For larger datasets, you can tune the index creation parameters:

sql

CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops) 
WITH (m = 16, ef_construction = 64);

* m: The maximum number of connections per layer (default 16). Higher m improves recall but increases index size and build time.

* ef_construction: The size of the dynamic candidate list for constructing the graph (default 64). Higher values lead to a better-quality graph (and better recall) at the cost of a longer build time.

Executing a Semantic Search Query:

A pure semantic search query uses a vector similarity operator. For all-MiniLM-L6-v2, cosine similarity is the standard. pgvector provides the <=> operator for this.

Let's find products semantically similar to "clothes for warm weather".

python

# search_semantic.py
import psycopg2
import numpy as np
from sentence_transformers import SentenceTransformer

def search(query_text, limit=5):
    model = SentenceTransformer('all-MiniLM-L6-v2')
    query_embedding = model.encode(query_text).tolist()

    conn = psycopg2.connect(...) # connection details
    try:
        with conn.cursor() as cur:
            cur.execute(
                """SELECT id, name, description, 1 - (embedding <=> %s) AS similarity
                   FROM products
                   ORDER BY embedding <=> %s
                   LIMIT %s""",
                (query_embedding, query_embedding, limit)
            )
            results = cur.fetchall()
            for row in results:
                print(f"ID: {row[0]}, Name: {row[1]}, Similarity: {row[3]:.4f}")
    finally:
        conn.close()

if __name__ == '__main__':
    search("clothes for warm weather")

Expected Output:

text

ID: 3, Name: Summer Breeze Sundress, Similarity: 0.7581
ID: 4, Name: Linen Holiday Shorts, Similarity: 0.7345
ID: 1, Name: Quantum Weave T-Shirt, Similarity: 0.5123
...

This works perfectly for the conceptual query. But if we search for XG-500-PRO, the results will be poor because the semantic meaning is weak.

Query Performance Analysis (EXPLAIN ANALYZE):

sql

EXPLAIN ANALYZE SELECT id FROM products ORDER BY embedding <=> '[...]' LIMIT 5;

text

Limit  (cost=...) (actual time=0.852..0.853 rows=5 loops=1)
  ->  Index Scan using products_embedding_idx on products ...
        Order By: (embedding <=> '[...]')
Planning Time: ...
Execution Time: 0.912 ms

The key is Index Scan. This confirms PostgreSQL is using our HNSW index, resulting in sub-millisecond query times even on millions of rows, rather than performing a slow, full table scan.

3. Component 2: Lexical Search with a Custom BM25 Function

PostgreSQL's built-in FTS provides ranking functions like ts_rank and ts_rank_cd. These are based on TF-IDF and are decent, but BM25 is the de-facto standard for modern lexical search, offering better relevance tuning.

Unfortunately, PostgreSQL does not have a native BM25 implementation. We must build our own using PL/pgSQL.

The BM25 formula is:

$$ ext{score}(q, d) = \sum_{i=1}^{n} \text{IDF}(q_i) \cdot \frac{f(q_i, d) \cdot (k_1 + 1)}{f(q_i, d) + k_1 \cdot (1 - b + b \cdot \frac{|d|}{\text{avgdl}})} $$

Where:

* IDF(q_i): Inverse Document Frequency of term q_i.

* f(q_i, d): Term frequency of q_i in document d.

* |d|: Length of document d.

* avgdl: Average document length across the entire collection.

* k1 and b: Hyperparameters (typically k1 is 1.2-2.0, b is 0.75).

We can implement this in PL/pgSQL.

sql

CREATE OR REPLACE FUNCTION bm25(
    query_tsquery TSQUERY,
    document_tsvector TSVECTOR,
    k1 REAL DEFAULT 1.2,
    b REAL DEFAULT 0.75
) RETURNS REAL AS $$
DECLARE
    query_lexemes TEXT[];
    lexeme TEXT;
    
    -- These should be pre-calculated and stored for performance
    total_docs NUMERIC;
    avg_doc_length NUMERIC;

    -- Document specific
    doc_length INT;
    term_freq INT;

    -- Calculation variables
    idf REAL;
    score REAL := 0.0;
BEGIN
    -- Retrieve pre-calculated stats
    SELECT stat_value INTO total_docs FROM product_search_stats WHERE stat_key = 'total_docs';
    SELECT stat_value INTO avg_doc_length FROM product_search_stats WHERE stat_key = 'avg_doc_length';

    IF total_docs IS NULL OR avg_doc_length IS NULL THEN
        -- Fallback or raise an error if stats are not available
        RETURN 0.0;
    END IF;

    -- Extract individual lexemes from the query
    query_lexemes := string_to_array(regexp_replace(tsquery_to_string(query_tsquery), '[()|&!]', '', 'g'), ' ');

    -- Get the length of the current document's tsvector
    doc_length := array_length(string_to_array(tsvector_to_string(document_tsvector), ' '), 1);

    -- Loop through each term in the query
    FOREACH lexeme IN ARRAY query_lexemes
    LOOP
        -- Skip if lexeme is empty
        CONTINUE WHEN lexeme = '';

        -- Calculate IDF for the current lexeme
        -- ts_stat provides the nentry (number of documents the term appears in)
        DECLARE
            n_docs_with_term INT;
        BEGIN
            SELECT nentry INTO n_docs_with_term 
            FROM ts_stat('SELECT fts_document FROM products') 
            WHERE word = lexeme;

            IF n_docs_with_term IS NULL OR n_docs_with_term = 0 THEN
                idf := 0;
            ELSE
                idf := ln((total_docs - n_docs_with_term + 0.5) / (n_docs_with_term + 0.5) + 1.0);
            END IF;
        END;

        -- Calculate term frequency in the current document
        -- A trick is to convert tsvector to string and count occurrences
        term_freq := array_length(ts_lexize('english', lexeme), 1) > 0 AND 
                     array_length(string_to_array(tsvector_to_string(document_tsvector), lexeme), 1) - 1;

        -- Calculate the BM25 term score
        IF term_freq > 0 THEN
            score := score + idf * (term_freq * (k1 + 1)) / (term_freq + k1 * (1 - b + b * (doc_length / avg_doc_length)));
        END IF;

    END LOOP;

    RETURN score;
END;
$$ LANGUAGE plpgsql STABLE;

Production Caveat: The ts_stat call inside the function is a performance bottleneck as it scans the index/table. In a high-throughput production environment, you would pre-calculate IDF scores for your entire vocabulary and store them in a lookup table, which this function would query instead.

Now, let's create the necessary GIN index for FTS.

sql

CREATE INDEX ON products USING GIN (fts_document);

Executing a Lexical Search Query:

Let's search for the specific SKU XG-500-PRO.

sql

-- The plainto_tsquery function is good for user-provided search terms
SELECT 
    id, 
    name, 
    bm25(plainto_tsquery('english', 'XG-500-PRO'), fts_document) AS score
FROM products
WHERE fts_document @@ plainto_tsquery('english', 'XG-500-PRO')
ORDER BY score DESC
LIMIT 5;

Expected Output:

text

 id |                   name                   |      score
----+------------------------------------------+------------------
  6 | Pro-Grade Graphics Card - Pro Edition    | 2.02732563018799
  5 | Pro-Grade Graphics Card                  | 1.01366281509399
...

This correctly and precisely identifies the product with the exact SKU, ranking it highest. This is something pure semantic search would struggle with.

4. Fusing the Results: Reciprocal Rank Fusion (RRF)

We now have two ordered lists of results, each with its own scoring system that is not directly comparable. How do we merge them?

A naive approach might be to normalize scores (e.g., min-max scaling) and add them. This is fragile and highly sensitive to the distribution of scores in each system. A much more robust and parameter-free method is Reciprocal Rank Fusion (RRF).

The RRF formula is simple. For each document, its final score is the sum of the reciprocal of its ranks in each result set:

$$ RRF_{score}(d) = \sum_{k \in \text{result_sets}} \frac{1}{c + \text{rank}_k(d)} $$

* rank_k(d) is the rank of document d in result set k.

* c is a constant to de-emphasize lower-ranked results (a common value is 60).

We can implement this elegantly in SQL using Common Table Expressions (CTEs).

The Complete Hybrid Search Query:

Here is the full query that performs vector search, FTS search, and fuses them with RRF.

sql

WITH vector_results AS (
    -- Get top 100 results from semantic search
    SELECT 
        id,
        -- Rank is needed for RRF
        ROW_NUMBER() OVER (ORDER BY embedding <=> '[...your_query_vector...]' ASC) as rank
    FROM products
    ORDER BY embedding <=> '[...your_query_vector...]'
    LIMIT 100
), 
fts_results AS (
    -- Get top 100 results from lexical search
    SELECT 
        id,
        -- Rank is needed for RRF
        ROW_NUMBER() OVER (ORDER BY bm25(plainto_tsquery('english', 'your search query'), fts_document) DESC) as rank
    FROM products
    WHERE fts_document @@ plainto_tsquery('english', 'your search query')
    ORDER BY bm25(plainto_tsquery('english', 'your search query'), fts_document) DESC
    LIMIT 100
)
SELECT 
    p.id,
    p.name,
    p.description,
    -- Calculate the RRF score. The COALESCE handles documents present in only one result set.
    COALESCE(1.0 / (60 + vr.rank), 0.0) + COALESCE(1.0 / (60 + fr.rank), 0.0) AS rrf_score
FROM products p
LEFT JOIN vector_results vr ON p.id = vr.id
LEFT JOIN fts_results fr ON p.id = fr.id
-- We only want to see results that appeared in at least one of the searches
WHERE vr.id IS NOT NULL OR fr.id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 10;

Let's test this with a query that has both semantic and lexical components: stylish summer clothes.

* Vector Search will find Summer Breeze Sundress and Linen Holiday Shorts.

* FTS will also find these because of the word summer.

If we search for pro-grade gpu, a query where both systems can contribute:

* Vector Search will find both graphics cards due to semantic similarity to gpu.

* FTS will strongly prefer the results with the exact words pro-grade.

RRF will combine these signals, pushing the Pro Edition to the top because it gets a high rank from both systems, demonstrating the power of the hybrid approach.

5. Performance Tuning and Production Considerations

This query is complex. To make it production-ready, we must optimize it.

HNSW Query-Time Tuning: `ef_search`

The most important parameter for HNSW query performance is ef_search. It controls the size of the candidate list during the search. A higher ef_search increases recall (accuracy) at the cost of latency.

sql

-- Can be set per-transaction
BEGIN;
SET LOCAL hnsw.ef_search = 100;
-- Your hybrid query here...
COMMIT;

You must benchmark this value for your specific dataset and latency requirements. For a 1 million vector dataset, you might test values from 40 to 200.

Example Benchmark:

`ef_search`	p95 Latency (ms)	Recall@10
40	5.2	0.96
100	11.8	0.985
200	25.1	0.991

This table (illustrative) shows the classic trade-off. A value of 100 might be a good balance.

CTE Result Set Size

In our query, we used LIMIT 100 in both CTEs. This is a critical tuning parameter.

* Too small (e.g., LIMIT 10): You might miss out on documents that rank low in one search but high in the other, which could have a high final RRF score.

* Too large (e.g., LIMIT 1000): You increase the amount of work the database has to do for ranking and joining, increasing latency.

A value between 50 and 200 is typically a good starting point. This depends heavily on how "spiky" your relevance scores are.

Managing BM25 Statistics

As noted, our BM25 function has a performance issue with ts_stat. The solution is to pre-calculate and cache these statistics.

Total Docs & Avg Doc Length: Our script already does this. This should be updated periodically via a cron job or after large data ingestion batches.

Inverse Document Frequency (IDF): Create a dedicated table for IDF scores.

sql

CREATE TABLE term_idf_stats (
    term TEXT PRIMARY KEY,
    idf REAL NOT NULL
);

Populate this table with a periodic script:

sql

-- This can be a slow query, run it offline.
INSERT INTO term_idf_stats (term, idf)
SELECT 
    word,
    ln(((SELECT stat_value FROM product_search_stats WHERE stat_key = 'total_docs') - nentry + 0.5) / (nentry + 0.5) + 1.0)
FROM ts_stat('SELECT fts_document FROM products')
ON CONFLICT (term) DO UPDATE SET idf = EXCLUDED.idf;

Then, modify the bm25 function to read from this term_idf_stats table instead of calling ts_stat in a loop. This will provide a massive performance boost.

Edge Case: Updating Embeddings and Re-indexing

Updating a row's embedding is costly because it requires updating the HNSW index. HNSW is not designed for frequent updates.

* Strategy: For systems with frequent updates, consider partitioning your table (e.g., by date). New data goes into a new partition with its own index. Queries can UNION ALL across recent, active partitions. Older, static partitions remain untouched. Alternatively, batch updates and perform a full REINDEX during off-peak hours.

Conclusion

We have successfully designed and implemented a sophisticated hybrid search system entirely within PostgreSQL. By combining the semantic power of pgvector with the lexical precision of a custom BM25 FTS implementation, and fusing the results with Reciprocal Rank Fusion, we've built a system that provides far superior relevance than either method alone.

This approach offers significant architectural simplification by avoiding the need for multiple disparate data stores. While it requires a deep understanding of SQL, indexing, and the underlying search algorithms, the result is a powerful, maintainable, and highly performant search backend. For senior engineers looking to push the boundaries of in-database capabilities, this pattern represents a compelling and production-ready solution.

The Relevance Problem: Semantic vs. Lexical Search

Our Technical Stack

1. Schema Design and Data Population

Populating with Embeddings

2. Component 1: Semantic Search with `pgvector` HNSW

3. Component 2: Lexical Search with a Custom BM25 Function

4. Fusing the Results: Reciprocal Rank Fusion (RRF)

5. Performance Tuning and Production Considerations

HNSW Query-Time Tuning: `ef_search`

CTE Result Set Size

Managing BM25 Statistics

Edge Case: Updating Embeddings and Re-indexing

Conclusion

Found this article helpful?