Hybrid Search in Postgres: Fusing pgvector and FTS for Superior Relevance

20 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Relevance Gap: Why Unimodal Search Is No Longer Sufficient

In modern application development, search functionality has evolved beyond simple keyword matching. The advent of dense vector embeddings, powered by transformer models, has enabled a leap into semantic search, where the meaning behind a query is understood. However, senior engineers operating at scale quickly discover that neither semantic search nor traditional keyword-based full-text search (FTS) is a panacea. Each has distinct, production-impacting failure modes.

Failure Mode 1: Dense Vector Search and Lexical Specificity

Dense vector search, typically implemented with extensions like pgvector, excels at capturing semantic relationships. A query for "gear for cold weather hiking" can correctly retrieve documents containing "winter trekking equipment." However, it often falters on queries requiring precise, literal matches. Consider these scenarios:

* Product SKUs & Identifiers: A search for "MacBook Pro M2X-2023" might not retrieve the exact product if the embedding model generalizes "M2X-2023" into a generic vector for high-end laptops.

* Acronyms & Jargon: A query for "GDPR compliance guide" might fail if the model hasn't been specifically trained on that acronym, instead retrieving documents about general data privacy.

* Proper Nouns & Names: Searching for a specific person, Johnathan Smith, might incorrectly surface results for Jon Smith if their semantic embeddings are close.

Failure Mode 2: Full-Text Search and Semantic Understanding

Conversely, traditional FTS, built on sparse vector models like TF-IDF or BM25 (and implemented in Postgres via tsvector and tsquery), is masterful at exact token matching. It will find "M2X-2023" flawlessly. Its weakness is the inverse of dense search: it has no genuine understanding of intent or synonyms.

* Synonym Gap: A query for "lightweight notebook for travel" will fail to match a document titled "ultra-portable laptop for commuters" if the keywords don't overlap, despite the identical semantic meaning.

* Conceptual Queries: A search for "how to improve database performance" will miss a critical article on "optimizing query execution plans" because the lexical tokens are entirely different.

The goal is not to choose between them but to leverage the strengths of both. This is the core principle of hybrid search: combine the semantic power of dense vectors with the lexical precision of sparse vectors (FTS) to deliver a superior, more robust search experience. This article details a production-ready pattern for implementing this directly in PostgreSQL.


Section 1: Architecting the Hybrid Data Model

To support a hybrid search system, our database schema must be designed to store and index both representations of our text data. We need the original text, a pre-processed tsvector for FTS, and a vector for the dense embedding.

Let's define a products table as our working example:

sql
-- Ensure you have the pgvector extension installed
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    -- Column for Full-Text Search
    -- Generated column to automatically update the tsvector
    tsv tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(name, '')), 'A') || ' ' || 
        setweight(to_tsvector('english', coalesce(description, '')), 'B')
    ) STORED,
    -- Column for Semantic Vector Search
    -- The dimension (e.g., 384) depends on your chosen embedding model
    embedding vector(384)
);

Key Architectural Decisions:

  • tsvector as a Generated Column: We use a GENERATED ALWAYS AS (...) STORED column for the tsv. This is a critical production pattern. It ensures that the tsvector is always in sync with the source name and description fields without requiring application-level logic or database triggers. The database handles the transformation automatically on every INSERT or UPDATE.
  • Weighted FTS: Notice the use of setweight. We've assigned a higher weight ('A') to the product name than the description ('B'). This allows us to bias our FTS ranking, considering matches in the title to be more relevant than matches in the body text. This is a simple but powerful tuning lever.
  • Fixed Vector Dimension: The vector(384) dimension is not arbitrary. It must precisely match the output dimension of your chosen sentence-transformer model. For instance, all-MiniLM-L6-v2 produces 384-dimension vectors, while OpenAI's text-embedding-ada-002 produces 1536-dimension vectors. This cannot be changed after table creation without a migration.
  • The Data Ingestion Pipeline

    With the schema in place, the ingestion process involves generating the dense vector embedding. This is an application-level concern, as it requires a machine learning model.

    Here's a Python example using the sentence-transformers library to generate embeddings and psycopg2 to insert the data. The tsv column is omitted from the INSERT statement because the database generates it automatically.

    python
    import psycopg2
    from sentence_transformers import SentenceTransformer
    
    # It's recommended to initialize the model once and reuse it.
    # This model outputs 384-dimensional vectors.
    model = SentenceTransformer('all-MiniLM-L6-v2')
    
    products_to_ingest = [
        {
            "name": "ProGear X-Treme Waterproof Backpack",
            "description": "A durable, all-weather 40L backpack designed for serious hikers and climbers. Features reinforced seams and a built-in rain cover."
        },
        {
            "name": "CityScape Commuter Laptop Bag",
            "description": "Sleek and lightweight notebook carrier for the urban professional. Padded compartment fits up to 15-inch laptops. Style meets function."
        },
        {
            "name": "Quantum-Charge Power Bank QC-5000",
            "description": "High-capacity 20,000mAh portable charger with fast-charging capabilities. Never run out of battery on the go again. SKU: QC-5000-BLK."
        }
    ]
    
    conn = psycopg2.connect("dbname=searchdb user=user password=pass host=localhost")
    cur = conn.cursor()
    
    for product in products_to_ingest:
        # Concatenate fields for a more holistic embedding
        text_to_embed = f"{product['name']}: {product['description']}"
        
        # Generate the dense vector embedding
        embedding = model.encode(text_to_embed).tolist()
        
        cur.execute(
            "INSERT INTO products (name, description, embedding) VALUES (%s, %s, %s)",
            (product['name'], product['description'], embedding)
        )
    
    conn.commit()
    cur.close()
    conn.close()
    
    print(f"Successfully ingested {len(products_to_ingest)} products.")

    Section 2: High-Performance Indexing for a Dual-Query System

    A query is only as fast as its underlying indexes. For our hybrid system, we need two distinct types of indexes, one for FTS and one for vector similarity search.

    Sparse Indexing: GIN for Full-Text Search

    For the tsvector column, the canonical choice is a Generalized Inverted Index (GIN). A GIN index creates an entry for each unique lexeme (word) and points to all the rows containing that lexeme. This makes it extremely efficient for finding rows that match the terms in a tsquery.

    sql
    -- Create a GIN index on the generated tsvector column
    CREATE INDEX idx_products_tsv ON products USING gin(tsv);

    Running EXPLAIN ANALYZE on an FTS query before and after creating this index will show a dramatic shift from a full table scan to a highly efficient bitmap index scan.

    Dense Indexing: HNSW for Approximate Nearest Neighbor (ANN) Search

    For vector similarity search, a sequential scan is computationally infeasible for any non-trivial number of rows. We need an Approximate Nearest Neighbor (ANN) index. pgvector supports two main types: IVFFlat and HNSW.

    While IVFFlat was an earlier standard, HNSW (Hierarchical Navigable Small World) is now the recommended choice for most production workloads due to its superior performance and recall characteristics, especially without needing to VACUUM the table as frequently.

    Creating an HNSW index involves tuning two key parameters:

    * m: The maximum number of connections per layer. Higher values create a more dense graph, improving recall at the cost of index size and build time. A typical value is between 16 and 64.

    * ef_construction: The size of the dynamic candidate list during index construction. A higher value leads to a better quality index and higher recall, but a slower build time. A typical value is between 64 and 256.

    sql
    -- Create an HNSW index on the embedding column
    -- The choice of m and ef_construction is a trade-off between
    -- index build time/size and search performance/recall.
    CREATE INDEX idx_products_embedding ON products USING hnsw(embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

    Note on vector_cosine_ops: We specify the operator class because we plan to use cosine similarity for our search. If you were using L2 distance, you would use vector_l2_ops. Aligning the index operator with your query operator is crucial for performance.


    Section 3: The Dual-Query Execution Strategy

    With our data model and indexes in place, we can now execute the search. The strategy is to run two independent queries—one for FTS and one for vector search—and then merge the results in the application layer. This approach prevents the PostgreSQL query planner from getting confused by two fundamentally different search paradigms in a single, complex query.

    Let's assume our user query is "portable charging solution".

    Step 1: Generate Query Vectors

    First, the application must transform the user's raw query into the two required formats: a tsquery and a dense vector embedding.

    python
    import psycopg2
    from sentence_transformers import SentenceTransformer
    
    # Re-use the same model from ingestion
    model = SentenceTransformer('all-MiniLM-L6-v2')
    
    user_query = "portable charging solution"
    
    # 1. Generate the dense vector for the query
    query_embedding = model.encode(user_query).tolist()
    
    # 2. Format the query for FTS (e.g., join words with '&')
    # A more robust implementation would use phraseto_tsquery or websearch_to_tsquery
    fts_query_string = " & ".join(user_query.split())

    Step 2: Execute Parallel Queries

    Now, execute the two queries against the database. We retrieve the id and a relevance score from each. It's important to request more results than you plan to display (e.g., LIMIT 100) to provide a rich set of candidates for the fusion step.

    python
    # This should be wrapped in a single function
    def execute_hybrid_search(query_embedding, fts_query_string, k=100):
        conn = psycopg2.connect("...")
        cur = conn.cursor()
        
        # FTS Query
        # We use ts_rank to score the results based on relevance
        cur.execute(
            """SELECT id, ts_rank(tsv, to_tsquery('english', %s)) as score 
               FROM products 
               WHERE tsv @@ to_tsquery('english', %s) 
               ORDER BY score DESC 
               LIMIT %s;""",
            (fts_query_string, fts_query_string, k)
        )
        fts_results = cur.fetchall() # Returns [(id, score), ...]
        
        # Vector Search Query
        # We use the cosine distance operator (<=>) and convert it to a similarity score (1 - distance)
        cur.execute(
            """SELECT id, 1 - (embedding <=> %s::vector) as score 
               FROM products 
               ORDER BY embedding <=> %s::vector 
               LIMIT %s;""",
            (str(query_embedding), str(query_embedding), k)
        )
        vector_results = cur.fetchall() # Returns [(id, score), ...]
        
        cur.close()
        conn.close()
        
        return fts_results, vector_results
    
    fts_results, vector_results = execute_hybrid_search(query_embedding, fts_query_string)

    Critical Note on Scoring:

    * The FTS ts_rank score is unbounded and depends on term frequency and weighting.

    * The vector similarity score (derived from 1 - cosine_distance) is neatly bounded between 0 and 2 (or 0 and 1 if embeddings are normalized, which they usually are).

    These two scoring systems are completely incompatible. A ts_rank of 0.8 is not necessarily more or less relevant than a vector similarity of 0.8. Directly comparing or naively summing these scores is a major anti-pattern. This leads us to the most crucial step: intelligent result fusion.


    Section 4: Advanced Fusion with Reciprocal Rank Fusion (RRF)

    Instead of struggling with score normalization, we can use a rank-based fusion algorithm. Reciprocal Rank Fusion (RRF) is a simple yet remarkably effective method that disregards the raw scores and focuses only on the rank of each document in the result lists.

    The formula for the RRF score of a document d is:

    RRF_Score(d) = Σ (1 / (k + rank_i(d)))

    Where:

    * i iterates over each result list (in our case, FTS and vector search).

    * rank_i(d) is the 1-based rank of document d in result list i.

    * k is a constant that mitigates the influence of highly-ranked items. A common value for k is 60.

    Python Implementation of RRF

    Here's a Python function that takes our two result sets and produces a final, re-ranked list.

    python
    from collections import defaultdict
    
    def reciprocal_rank_fusion(list_of_results, k=60):
        """
        Performs Reciprocal Rank Fusion on a list of search result lists.
    
        Args:
            list_of_results: A list where each element is a list of (id, score) tuples.
            k: A constant for the RRF formula.
    
        Returns:
            A list of (id, rrf_score) tuples, sorted by score in descending order.
        """
        rrf_scores = defaultdict(float)
    
        # Process each result list (e.g., from FTS, from vector search)
        for results in list_of_results:
            # `results` is a list of (id, score) tuples
            for rank, (doc_id, _) in enumerate(results, 1):
                rrf_scores[doc_id] += 1 / (k + rank)
                
        # Sort the documents by their final RRF score
        sorted_reranked_results = sorted(
            rrf_scores.items(), 
            key=lambda item: item[1], 
            reverse=True
        )
        
        return sorted_reranked_results
    
    # Assuming fts_results and vector_results from the previous step
    final_results = reciprocal_rank_fusion([fts_results, vector_results])
    
    print("FTS Results:", fts_results[:5])
    print("Vector Results:", vector_results[:5])
    print("Final Reranked Results:", final_results[:5])

    Why is RRF so effective?

  • It Sidesteps Normalization: It completely ignores the original, incompatible scores.
  • It Values Rank: It inherently understands that being #1 in any list is a strong signal, but it also gives credit to documents that appear consistently across multiple lists, even at lower ranks.
  • It's Tunable: The k parameter allows you to control how much you penalize lower-ranked results. A smaller k gives more weight to top-ranked items.

  • Section 5: Production Considerations & Edge Cases

    Deploying this system requires attention to performance, error handling, and user experience details like pagination.

    Performance Tuning

    * HNSW Query-Time Tuning: The ef_search parameter for HNSW can be set at query time. It controls the size of the candidate list during search. A higher value increases recall and accuracy but also increases latency. This is a critical knob for balancing speed and relevance.

    sql
        -- Set for the current session before running the vector query
        SET hnsw.ef_search = 100;
        SELECT id, 1 - (embedding <=> '...') FROM products ...;

    * EXPLAIN ANALYZE is Mandatory: Always profile your queries. Ensure the FTS query is using its GIN index (Bitmap Heap Scan on Bitmap Index Scan) and the vector query is using its HNSW index (Index Scan using idx_products_embedding). If you see a sequential scan, your index is not being used, and performance will be catastrophic.

    * Memory (work_mem): FTS queries, especially those with many terms and results, can consume significant memory. Monitor and adjust PostgreSQL's work_mem setting to ensure these operations can be performed efficiently in memory.

    Edge Case: Handling Empty Result Sets

    What if a query returns zero results from one of the sources? For example, a search for a SKU might get FTS results but no semantic matches. The RRF implementation shown above handles this gracefully. If a result list is empty, it simply contributes nothing to the final scores, and the results from the non-empty list will dominate, which is the desired behavior.

    The Pagination Challenge

    Standard LIMIT/OFFSET pagination does not work with a fused result set. You cannot simply OFFSET the individual FTS and vector queries, as the top 10 final results might be composed of items ranked #1-5 from FTS and #95-100 from vector search.

    The Correct Approach:

    • Fetch a large number of candidates from each search system, far more than a single page size (e.g., fetch 100-200 results from both FTS and vector search).
    • Perform the RRF fusion on this entire candidate set.
    • The application can then cache this fused and ranked list of IDs and perform pagination on it.

    This introduces state management complexity on the application side but is the only way to guarantee correct pagination. For subsequent pages, you can retrieve the next slice from the cached ID list without re-running the expensive database queries.


    Section 6: Complete End-to-End Example

    Let's tie everything together into a single, runnable script.

    python
    import psycopg2
    import numpy as np
    from sentence_transformers import SentenceTransformer
    from collections import defaultdict
    
    # --- 0. Configuration ---
    DB_PARAMS = "dbname=searchdb user=user password=pass host=localhost"
    MODEL_NAME = 'all-MiniLM-L6-v2' # 384 dimensions
    VECTOR_DIMENSION = 384
    
    # --- 1. Setup & Ingestion ---
    def setup_database(conn):
        cur = conn.cursor()
        cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
        cur.execute("DROP TABLE IF EXISTS products;")
        cur.execute(f"""
            CREATE TABLE products (
                id SERIAL PRIMARY KEY,
                name VARCHAR(255) NOT NULL,
                description TEXT,
                tsv tsvector GENERATED ALWAYS AS (
                    setweight(to_tsvector('english', coalesce(name, '')), 'A') || ' ' || 
                    setweight(to_tsvector('english', coalesce(description, '')), 'B')
                ) STORED,
                embedding vector({VECTOR_DIMENSION})
            );
        """)
        cur.execute("CREATE INDEX idx_products_tsv ON products USING gin(tsv);")
        cur.execute("CREATE INDEX idx_products_embedding ON products USING hnsw(embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);");
        conn.commit()
        cur.close()
    
    def ingest_data(conn, model):
        products_to_ingest = [
            {"id": 1, "name": "ProGear X-Treme Waterproof Backpack", "description": "A durable, all-weather 40L backpack designed for serious hikers and climbers. Features reinforced seams and a built-in rain cover."}, 
            {"id": 2, "name": "CityScape Commuter Laptop Bag", "description": "Sleek and lightweight notebook carrier for the urban professional. Padded compartment fits up to 15-inch laptops. Style meets function."}, 
            {"id": 3, "name": "Quantum-Charge Power Bank QC-5000", "description": "High-capacity 20,000mAh portable charger with fast-charging capabilities. SKU: QC-5000-BLK."}, 
            {"id": 4, "name": "Traveler's Lightweight Carry-On", "description": "An ultra-light bag designed for frequent flyers. Fits in overhead compartments with ease."}, 
            {"id": 5, "name": "DataSafe Encrypted USB Drive", "description": "A secure flash drive with hardware-based AES-256 encryption. Keep your data safe."} 
        ]
        cur = conn.cursor()
        for p in products_to_ingest:
            text_to_embed = f"{p['name']}: {p['description']}"
            embedding = model.encode(text_to_embed).tolist()
            cur.execute("INSERT INTO products (id, name, description, embedding) VALUES (%s, %s, %s, %s)", (p['id'], p['name'], p['description'], embedding))
        conn.commit()
        cur.close()
        print(f"Ingested {len(products_to_ingest)} products.")
    
    # --- 2. Search & Fusion Logic ---
    def reciprocal_rank_fusion(list_of_results, k=60):
        rrf_scores = defaultdict(float)
        for results in list_of_results:
            for rank, (doc_id, _) in enumerate(results, 1):
                rrf_scores[doc_id] += 1 / (k + rank)
        return sorted(rrf_scores.items(), key=lambda item: item[1], reverse=True)
    
    def hybrid_search(conn, model, user_query, top_k=100):
        cur = conn.cursor()
        
        # Generate vectors and query strings
        query_embedding = model.encode(user_query).tolist()
        fts_query_string = " & ".join(user_query.split())
    
        # FTS query
        cur.execute(
            "SELECT id, ts_rank(tsv, to_tsquery('english', %s)) as score FROM products WHERE tsv @@ to_tsquery('english', %s) ORDER BY score DESC LIMIT %s;",
            (fts_query_string, fts_query_string, top_k)
        )
        fts_results = cur.fetchall()
    
        # Vector search query
        cur.execute("SET LOCAL hnsw.ef_search = 100;")
        cur.execute(
            "SELECT id, 1 - (embedding <=> %s::vector) as score FROM products ORDER BY embedding <=> %s::vector LIMIT %s;",
            (str(query_embedding), str(query_embedding), top_k)
        )
        vector_results = cur.fetchall()
        
        cur.close()
        
        # Fusion
        fused_results = reciprocal_rank_fusion([fts_results, vector_results])
        
        return {
            "fts_results": fts_results,
            "vector_results": vector_results,
            "fused_results": fused_results
        }
    
    # --- 3. Main Execution ---
    if __name__ == "__main__":
        print("Initializing model...")
        transformer_model = SentenceTransformer(MODEL_NAME)
        
        conn = psycopg2.connect(DB_PARAMS)
        print("Setting up database...")
        setup_database(conn)
        print("Ingesting data...")
        ingest_data(conn, transformer_model)
    
        # -- Run a test query that benefits from hybrid search --
        print("\n--- Query 1: 'bag for my computer' (Semantic) ---")
        query1 = "bag for my computer"
        results1 = hybrid_search(conn, transformer_model, query1)
        print("FTS found:", [r[0] for r in results1['fts_results']]) # FTS will likely fail or find little
        print("Vector found:", [r[0] for r in results1['vector_results']]) # Vector will find laptop bag (ID 2)
        print("Fused top result ID:", results1['fused_results'][0][0])
    
        print("\n--- Query 2: 'QC-5000' (Keyword) ---")
        query2 = "QC-5000"
        results2 = hybrid_search(conn, transformer_model, query2)
        print("FTS found:", [r[0] for r in results2['fts_results']]) # FTS will find power bank (ID 3)
        print("Vector found:", [r[0] for r in results2['vector_results']]) # Vector search will likely fail
        print("Fused top result ID:", results2['fused_results'][0][0])
    
        conn.close()

    Conclusion

    By treating dense and sparse search as two expert systems and combining their ranked outputs with a proven fusion algorithm like RRF, we can build a significantly more relevant search engine that is resilient to the failure modes of any single approach. The power of this pattern lies in its implementation directly within PostgreSQL, leveraging the maturity of its FTS engine and the cutting-edge capabilities of extensions like pgvector. This avoids the operational overhead and complexity of managing a separate search service like Elasticsearch or OpenSearch, providing a tightly integrated, high-performance, and relevance-tuned solution for modern applications.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles