Postgres Hybrid Search: Combining pgvector and FTS for Superior Relevance

22 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Dichotomy of Modern Search: Semantic vs. Lexical

In modern application development, search is no longer a solved problem. The rise of dense vector embeddings has revolutionized semantic search, allowing us to find results based on conceptual meaning rather than just keyword matching. A search for "ways to stay active in winter" can correctly surface documents about "indoor skiing" or "cold weather workouts," a feat impossible for traditional lexical search. We've all seen the power of this with pgvector and its ability to perform nearest-neighbor searches.

However, in the rush to adopt vector search, many teams are discovering its Achilles' heel: a lack of precision for specific, literal terms. Vector search models, by their nature, generalize. They might struggle to differentiate between a product code like XG-200-A and XG-200-B, or fail to retrieve a document when a user searches for an exact, but rare, technical term or proper noun. This is where traditional, battle-tested Full-Text Search (FTS) excels. FTS, with its tokenization, stemming, and inverted indexes, is built for lexical precision.

This leaves senior engineers with a critical dilemma: do you choose semantic recall or lexical precision? The answer, for a production-grade system, must be both. This article is a deep dive into implementing a robust hybrid search system entirely within PostgreSQL. We will combine pgvector with Postgres's native FTS capabilities, fusing their results using a sophisticated ranking algorithm called Reciprocal Rank Fusion (RRF). We will not be building a toy example; we will be designing a production-ready solution, complete with advanced indexing, query optimization, and encapsulation into a reusable database function.

Our Scenario: An E-Commerce Product Catalog

Imagine an e-commerce platform selling technical gear. A user might search for:

  • "lightweight waterproof jacket for hiking" (A semantic query)
  • "Gore-Tex Pro Shell" (A specific keyword/brand query)
  • "Arcteryx Alpha SV jacket" (A mix of brand and product line)
  • Pure vector search excels at #1 but may fail at #2 if "Gore-Tex Pro Shell" wasn't a distinct concept in the embedding model's training data. Pure FTS excels at #2 but struggles with #1, missing a jacket described as a "featherlight rain shell for trekking."

    Our goal is to build a system that handles all three query types gracefully, returning the most relevant results by combining the strengths of both search paradigms.

    Section 1: The Advanced Schema and Indexing Strategy

    A robust hybrid search system begins with a schema designed for performance. We need to store our product data, its vector embedding, and a pre-processed tsvector for FTS. The key is not just adding columns, but indexing them correctly.

    1.1 Table Definition

    Let's define our products table. Notice the inclusion of embedding vector(384) (a common dimension for models like all-MiniLM-L6-v2) and fts_document tsvector.

    sql
    -- Ensure the pgvector extension is installed
    CREATE EXTENSION IF NOT EXISTS vector;
    
    CREATE TABLE products (
        id SERIAL PRIMARY KEY,
        name VARCHAR(255) NOT NULL,
        description TEXT,
        -- Storing the embedding vector directly in the table.
        -- 384 is the dimension for the 'all-MiniLM-L6-v2' model.
        embedding vector(384),
        -- A dedicated column for the pre-calculated tsvector.
        fts_document tsvector
    );

    1.2 Automated `tsvector` Generation with Triggers

    In a production environment, you should never calculate the tsvector on the fly during a query. It's inefficient. Instead, we'll use a trigger to automatically compute and store it whenever a product's name or description changes. This pre-computation is a critical performance optimization.

    We'll combine the name and description into a single document, but we'll assign different weights. A keyword match in the product name is likely more significant than one in the description. We use setweight to achieve this.

    sql
    -- Create a function that will be called by our trigger.
    CREATE OR REPLACE FUNCTION update_fts_document() RETURNS TRIGGER AS $$
    BEGIN
        -- Combine name and description into a single tsvector.
        -- Weight 'A' is the highest, 'B' is lower.
        NEW.fts_document := 
            setweight(to_tsvector('english', coalesce(NEW.name, '')), 'A') ||
            setweight(to_tsvector('english', coalesce(NEW.description, '')), 'B');
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    -- Create the trigger to execute the function before any insert or update.
    CREATE TRIGGER products_fts_update
    BEFORE INSERT OR UPDATE ON products
    FOR EACH ROW EXECUTE FUNCTION update_fts_document();

    Now, any INSERT or UPDATE on the products table will automatically keep the fts_document column perfectly in sync.

    1.3 The Dual-Index Strategy: HNSW and GIN

    This is the core of our performance strategy. We need two distinct types of indexes for our two distinct search types.

    1. HNSW Index for Vector Search:

    For vector similarity search, we'll use an HNSW (Hierarchical Navigable Small World) index. HNSW is generally preferred over IVFFlat for high-dimensional data and high-recall scenarios, offering a superior speed-recall tradeoff.

    sql
    -- HNSW index for approximate nearest neighbor search.
    -- m: max number of connections per layer (higher = better recall, slower index build).
    -- ef_construction: size of the dynamic candidate list during build (higher = better quality, slower build).
    CREATE INDEX ON products USING hnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64);

    * m (default 16): Defines the graph's connectivity. Higher values create more dense graphs, leading to better search quality but increased index size and slower build times.

    * ef_construction (default 64): Controls the quality of the index graph. A higher value means a more exhaustive search for neighbors during insertion, yielding a better index at the cost of build time.

    2. GIN Index for Full-Text Search:

    For FTS, a GIN (Generalized Inverted Index) is the optimal choice. It's highly efficient for indexing composite values like tsvector, where each document contains many lexemes.

    sql
    -- GIN index for fast full-text search.
    CREATE INDEX ON products USING gin (fts_document);

    With this schema, we are now prepared to ingest data and execute high-performance queries for both search paradigms independently.

    Section 2: Data Ingestion and Embedding Generation

    Here is a complete, runnable Python script demonstrating a production-style pattern for backfilling embeddings for existing data. It uses sentence-transformers for high-quality embeddings and psycopg2 for efficient batch updates to the database.

    python
    import os
    import psycopg2
    from psycopg2.extras import execute_batch
    from sentence_transformers import SentenceTransformer
    from dotenv import load_dotenv
    
    # Load environment variables (e.g., DB connection string)
    load_dotenv()
    
    # --- Configuration ---
    DB_URL = os.getenv("DATABASE_URL")
    MODEL_NAME = 'all-MiniLM-L6-v2' # 384 dimensions
    BATCH_SIZE = 128
    
    # --- Sample Data ---
    SAMPLE_PRODUCTS = [
        {'name': 'Arcteryx Alpha SV Jacket', 'description': 'A highly durable, waterproof, and breathable jacket made with Gore-Tex Pro for severe alpine conditions.'},
        {'name': 'Patagonia Nano Puff Hoody', 'description': 'A windproof and water-resistant insulated hoody. Uses lightweight PrimaLoft Gold Insulation Eco.'},
        {'name': 'Osprey Aether 65 Backpack', 'description': 'A customizable and comfortable backpack for multi-day treks and heavy loads.'},
        {'name': 'La Sportiva Solution Comp Climbing Shoe', 'description': 'High-performance climbing shoe designed for competition bouldering. Aggressive downturn.'},
        {'name': 'MSR Hubba Hubba NX 2 Tent', 'description': 'A lightweight, freestanding 2-person backpacking tent perfect for three-season use.'},
        {'name': 'Black Diamond Trail Ergo Cork Trekking Poles', 'description': 'Ergonomic and durable poles for hiking and trekking. Features a cork grip and adjustable length.'},
        {'name': 'Featherlight Rain Shell', 'description': 'An ultralight waterproof shell for trail runners and fast hikers. Minimalist design.'}
    ]
    
    def get_db_connection():
        """Establishes a connection to the PostgreSQL database."""
        try:
            conn = psycopg2.connect(DB_URL)
            # Required for pgvector
            from pgvector.psycopg2 import register_vector
            register_vector(conn)
            return conn
        except psycopg2.OperationalError as e:
            print(f"Could not connect to database: {e}")
            raise
    
    def populate_sample_data(conn):
        """Inserts sample data if the table is empty."""
        with conn.cursor() as cur:
            cur.execute("SELECT COUNT(*) FROM products;")
            if cur.fetchone()[0] == 0:
                print("Populating table with sample data...")
                insert_query = "INSERT INTO products (name, description) VALUES (%s, %s);"
                execute_batch(cur, insert_query, [(p['name'], p['description']) for p in SAMPLE_PRODUCTS])
                conn.commit()
                print(f"{len(SAMPLE_PRODUCTS)} products inserted.")
            else:
                print("Table already contains data.")
    
    def generate_and_update_embeddings(conn):
        """Generates embeddings for products that don't have them."""
        print("Loading sentence transformer model...")
        model = SentenceTransformer(MODEL_NAME)
        
        with conn.cursor() as cur:
            # Find products that need embeddings
            cur.execute("SELECT id, name, description FROM products WHERE embedding IS NULL;")
            products_to_update = cur.fetchall()
            
            if not products_to_update:
                print("All products have embeddings. Nothing to do.")
                return
    
            print(f"Found {len(products_to_update)} products to process.")
    
            # Prepare data for the model
            ids = [p[0] for p in products_to_update]
            texts_to_embed = [f"{p[1]}: {p[2]}" for p in products_to_update]
    
            print("Generating embeddings...")
            embeddings = model.encode(texts_to_embed, show_progress_bar=True, batch_size=BATCH_SIZE)
    
            print("Updating database in batches...")
            update_query = "UPDATE products SET embedding = %s WHERE id = %s;"
            # We need to reshape embeddings for execute_batch
            update_data = [(embedding, id) for id, embedding in zip(ids, embeddings)]
    
            execute_batch(cur, update_query, update_data, page_size=BATCH_SIZE)
            conn.commit()
            print(f"Successfully updated {len(products_to_update)} product embeddings.")
    
    if __name__ == "__main__":
        connection = None
        try:
            connection = get_db_connection()
            populate_sample_data(connection)
            generate_and_update_embeddings(connection)
        finally:
            if connection:
                connection.close()
    

    This script is idempotent. You can run it multiple times, and it will only process products that are missing an embedding, a common requirement for production data pipelines.

    Section 3: The Heart of Hybrid Search - Reciprocal Rank Fusion (RRF)

    Now that our data is prepared, how do we combine the results from two different search systems? A naive approach might be to normalize the scores from both FTS (ts_rank) and vector search (distance) and add them together. This is fraught with problems. The score distributions are completely different, and finding a stable normalization function is nearly impossible.

    We need a better, rank-based approach. Reciprocal Rank Fusion (RRF) is a simple yet incredibly effective technique that discards the raw scores and focuses purely on the rank of each result in its respective list.

    The RRF score for a document is calculated by summing the reciprocal of its rank across different result sets:

    RRF_Score(d) = Σ (1 / (k + rank_i(d)))

  • rank_i(d) is the rank of document d in result set i.
  • k is a constant used to mitigate the effect of high ranks (i.e., results at position 1 or 2) having an outsized influence. A common value for k is 60.
  • We will implement this logic directly in SQL using Common Table Expressions (CTEs) for clarity and performance.

    3.1 Building the Hybrid Query Step-by-Step

    Let's construct the query for the search term: "lightweight shell for hiking".

    First, we need the query's embedding. We'll pre-calculate this in our application.

    python
    # In your application code
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer('all-MiniLM-L6-v2')
    query_text = "lightweight shell for hiking"
    query_embedding = model.encode(query_text)
    # This query_embedding is what we pass to our SQL query

    Now for the SQL. We'll use CTEs to fetch ranked lists from each search method.

    sql
    -- Define constants for the query
    WITH constants AS (
        SELECT
            -- The embedding for "lightweight shell for hiking"
            '[...]'::vector(384) AS query_embedding,
            -- The FTS query string
            'lightweight & shell & hiking' AS fts_query,
            -- RRF constant
            60 AS k,
            -- How many results to fetch from each source
            100 AS top_n
    ),
    -- 1. Get ranked results from Full-Text Search
    fts_results AS (
        SELECT 
            id,
            -- Rank the results based on their FTS score
            ROW_NUMBER() OVER (ORDER BY ts_rank_cd(fts_document, to_tsquery('english', (SELECT fts_query FROM constants))) DESC) as rank
        FROM products
        WHERE fts_document @@ to_tsquery('english', (SELECT fts_query FROM constants))
        ORDER BY rank
        LIMIT (SELECT top_n FROM constants)
    ),
    -- 2. Get ranked results from Vector Search
    vector_results AS (
        SELECT 
            id,
            -- Rank the results based on cosine distance (1 - cosine similarity)
            ROW_NUMBER() OVER (ORDER BY embedding <=> (SELECT query_embedding FROM constants) ASC) as rank
        FROM products
        ORDER BY embedding <=> (SELECT query_embedding FROM constants)
        LIMIT (SELECT top_n FROM constants)
    )
    -- 3. Fuse the results using RRF
    SELECT
        p.id,
        p.name,
        -- Calculate the final RRF score
        COALESCE(1.0 / (c.k + fts.rank), 0.0) + COALESCE(1.0 / (c.k + vec.rank), 0.0) AS rrf_score
    FROM products p
    -- Use a FULL OUTER JOIN to include results that are in one list but not the other
    LEFT JOIN fts_results fts ON p.id = fts.id
    LEFT JOIN vector_results vec ON p.id = vec.id
    CROSS JOIN constants c
    WHERE fts.id IS NOT NULL OR vec.id IS NOT NULL
    ORDER BY rrf_score DESC
    LIMIT 20;

    3.2 Deconstructing the Query

  • constants CTE: We define our input parameters here. This makes the query cleaner and allows for easy parameterization. The query_embedding would be passed in from your application.
  • fts_results CTE: This performs a standard FTS query. We use ts_rank_cd for scoring and, crucially, ROW_NUMBER() to assign a rank to each result. This rank is the input to our RRF calculation.
  • vector_results CTE: This performs the vector similarity search using the L2 distance operator <=>. Again, we use ROW_NUMBER() to get the rank.
  • Final Fusion SELECT: This is where the magic happens.
  • * We use LEFT JOIN from products to our two result sets. A FULL OUTER JOIN between fts_results and vector_results is also a valid pattern.

    * COALESCE(..., 0.0) is critical. It ensures that if a product appears in only one result set, its score from the other set is treated as zero, not NULL.

    * The rrf_score is calculated by applying the RRF formula to the ranks from each CTE.

    The WHERE clause filters out any products that didn't appear in either* result set.

    * Finally, we ORDER BY rrf_score DESC to get our final, hybrid-ranked list.

    This single query effectively runs two separate searches, ranks them, and fuses the ranks into a final, more relevant list. For our query "lightweight shell for hiking," this would correctly boost the "Featherlight Rain Shell" (high semantic score) and the "Arcteryx Alpha SV Jacket" (if its description contains the keywords), placing them both near the top.

    Section 4: Productionizing with a PL/pgSQL Function

    Exposing that complex CTE query to your application layer is a bad practice. It's verbose, hard to maintain, and tightly couples your application logic to the database schema. The professional approach is to encapsulate this logic within a PostgreSQL function.

    This provides a clean, stable API for your application to call.

    sql
    CREATE OR REPLACE FUNCTION hybrid_search(
        p_query_text TEXT,
        p_query_embedding vector(384),
        p_match_count INT,
        p_k INT DEFAULT 60
    )
    RETURNS TABLE (id INT, name VARCHAR, description TEXT, rrf_score FLOAT) AS $$
    DECLARE
        v_fts_query TEXT;
    BEGIN
        -- Convert the query text to a tsquery-compatible format (e.g., 'term1 & term2')
        v_fts_query := array_to_string(string_to_array(trim(p_query_text), ' '), ' & ');
    
        RETURN QUERY
        WITH fts_results AS (
            SELECT 
                p.id,
                ROW_NUMBER() OVER (ORDER BY ts_rank_cd(p.fts_document, to_tsquery('english', v_fts_query)) DESC) as rank
            FROM products p
            WHERE p.fts_document @@ to_tsquery('english', v_fts_query)
            -- Fetch more results than needed to allow for fusion
            LIMIT p_match_count * 2
        ),
        vector_results AS (
            SELECT 
                p.id,
                ROW_NUMBER() OVER (ORDER BY p.embedding <=> p_query_embedding ASC) as rank
            FROM products p
            ORDER BY p.embedding <=> p_query_embedding
            LIMIT p_match_count * 2
        )
        SELECT
            p.id,
            p.name,
            p.description,
            (COALESCE(1.0 / (p_k + fts.rank), 0.0) + COALESCE(1.0 / (p_k + vec.rank), 0.0))::FLOAT AS rrf_score
        FROM products p
        LEFT JOIN fts_results fts ON p.id = fts.id
        LEFT JOIN vector_results vec ON p.id = vec.id
        WHERE fts.id IS NOT NULL OR vec.id IS NOT NULL
        ORDER BY rrf_score DESC
        LIMIT p_match_count;
    END;
    $$ LANGUAGE plpgsql;

    Now, your application's query becomes incredibly simple:

    sql
    SELECT * FROM hybrid_search('lightweight shell for hiking', '[...embedding...]', 20);

    This is a vastly superior pattern for maintainability and security.

    Section 5: Performance Tuning and Edge Case Analysis

    Writing the query is only half the battle. In a production system with millions of rows, performance is paramount. Let's analyze and tune our setup.

    5.1 Query Plan Analysis with `EXPLAIN ANALYZE`

    Let's inspect the query plan for our CTE-based search. Run the EXPLAIN ANALYZE command on the raw query (not the function, initially, for clarity).

    sql
    EXPLAIN ANALYZE
    -- ... (paste the full CTE query here)

    You should see output similar to this (abbreviated):

    text
    Limit  (cost=123.45..123.50 rows=20 width=123) (actual time=15.123..15.125 rows=20 loops=1)
      ->  Sort  (cost=123.45..123.95 rows=200 width=123) (actual time=15.122..15.123 rows=20 loops=1)
            Sort Key: ((COALESCE(...) + COALESCE(...))) DESC
            ->  Hash Left Join  (cost=50.12..100.34 ...)
                  ->  CTE Scan on vector_results vec
                        ->  Limit
                              ->  Index Scan using products_embedding_idx on products (embedding <=> ...)
                  ->  Hash
                        ->  CTE Scan on fts_results fts
                              ->  Limit
                                    ->  Bitmap Heap Scan on products
                                          ->  Bitmap Index Scan on products_fts_document_idx (fts_document @@ ...)

    Key things to verify:

  • Index Scans: The planner must be using your indexes. Look for Index Scan using products_embedding_idx for the vector search and Bitmap Index Scan on products_fts_document_idx for the FTS part. If you see a Seq Scan (Sequential Scan), your query or index is misconfigured, and performance will be catastrophic on large tables.
  • CTE Execution: The CTEs are executed first, generating the two small, ranked lists. The final join and sort operation happens on these much smaller, in-memory sets of IDs, which is highly efficient.
  • 5.2 Tuning `pgvector` HNSW Search Parameters

    The HNSW index has a crucial run-time parameter: hnsw.ef_search (default 40). This controls the size of the dynamic candidate list during search. It's a direct trade-off between speed and recall.

  • Low ef_search (e.g., 20): Very fast, but may miss some relevant neighbors (lower recall).
  • High ef_search (e.g., 100): Slower, but more exhaustive search leading to better recall.
  • You can set this parameter per-transaction:

    sql
    BEGIN;
    SET LOCAL hnsw.ef_search = 80;
    -- Your hybrid search query here
    SELECT * FROM hybrid_search(...);
    COMMIT;

    Benchmark this value with your own data. A good strategy is to start with the default and increase it until you see diminishing returns in recall for an acceptable latency cost.

    5.3 Edge Case: Efficient Pagination

    Standard LIMIT/OFFSET pagination is notoriously inefficient for deep pages, as the database still has to fetch and sort all rows up to the offset. With our rrf_score, which can have ties, keyset pagination is the superior solution.

    To get the second page of results, you would pass the rrf_score and id of the last item from the first page.

    sql
    -- Assuming the last item on page 1 had rrf_score=0.033 and id=42
    -- And we are fetching 20 items per page
    
    ... (the CTEs remain the same) ...
    
    SELECT ...
    FROM products p
    ...
    WHERE (fts.id IS NOT NULL OR vec.id IS NOT NULL)
      -- Add this keyset pagination clause
      AND ( (COALESCE(...) + COALESCE(...)), p.id) < (0.033, 42)
    ORDER BY rrf_score DESC, p.id DESC -- Add a tie-breaker
    LIMIT 20;

    This allows the database to use the index to jump directly to the starting point of the next page, resulting in near-constant query times regardless of page depth.

    Section 6: Advanced Strategy - Weighted Fusion

    Pure RRF treats both search methods as equals. Sometimes, however, you may want to give more weight to one over the other. For example, in a search for product codes, you might want to heavily favor FTS matches.

    We can introduce weighting factors into our final score calculation. This deviates from pure RRF but provides more business-logic control.

    sql
    -- In the final SELECT statement
    SELECT
        p.id,
        p.name,
        -- Define weights (e.g., passed as function parameters)
        (0.4 * COALESCE(1.0 / (c.k + fts.rank), 0.0)) + 
        (0.6 * COALESCE(1.0 / (c.k + vec.rank), 0.0)) AS weighted_score
    FROM ...
    ORDER BY weighted_score DESC

    In this example, we've given a 60% weight to the semantic search score and 40% to the lexical score. These weights are highly domain-specific and require careful tuning and A/B testing to optimize for user satisfaction.

    Conclusion: The Power of an Integrated Approach

    By combining pgvector and native Full-Text Search within PostgreSQL, we have built a search system that is more powerful and relevant than either method in isolation. This integrated approach avoids the complexity and operational overhead of maintaining a separate search service like Elasticsearch or a dedicated vector database, while still providing production-grade performance and features.

    The key takeaways for implementing a robust hybrid search system are:

  • Design for Purpose: Use a schema with dedicated, pre-computed columns for search vectors (tsvector, embedding).
  • Index Correctly: Leverage the right index for the job—GIN for FTS and HNSW for vector search. Never neglect to analyze your query plans.
  • Fuse Ranks, Not Scores: Use Reciprocal Rank Fusion (RRF) for a stable and effective way to combine results from disparate systems.
  • Encapsulate Logic: Abstract complex queries into database functions to create a clean API for your application and simplify maintenance.
  • Tune for Performance: Understand and tune runtime parameters like hnsw.ef_search and implement efficient pagination strategies like keyset pagination from the start.
  • This pattern provides a sophisticated, powerful, and maintainable search solution that can be implemented and scaled directly within the database that senior engineers already know and trust.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles