Postgres Hybrid Search: Fusing pg_vector and FTS for Superior Relevance

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Relevance Gap: Why Standalone Search Fails in Production

In modern application development, search is no longer a simple keyword-matching feature. Users expect semantic understanding. This has led to the rapid adoption of vector databases and extensions like pg_vector for PostgreSQL, enabling similarity search on dense vector embeddings. However, senior engineers quickly discover a critical flaw in pure vector search systems: they are surprisingly poor at matching specific, out-of-vocabulary terms, SKUs, error codes, or exact names. A query for "PostgreSQL error 42P01" might return documents about general database errors, completely missing the critical keyword 42P01.

Conversely, traditional Full-Text Search (FTS), a mature and powerful feature in PostgreSQL, excels at lexical matching, stemming, and keyword relevance. But it has no concept of semantic meaning. A query for "how to speed up database queries" will miss a highly relevant article titled "Optimizing SQL Performance" if the exact keywords don't overlap sufficiently.

This creates a relevance gap. Relying on either system alone leads to a subpar user experience. The solution is not to choose one over the other, but to combine them. This article provides a blueprint for implementing a sophisticated hybrid search system directly within PostgreSQL, fusing the semantic power of pg_vector with the lexical precision of FTS. We will bypass simplistic approaches and build a production-ready solution using advanced SQL, focusing on Reciprocal Rank Fusion (RRF) for combining results.

Prerequisites

This is not an introductory guide. It assumes you have a working knowledge of:

* PostgreSQL administration and advanced SQL (CTEs, Window Functions).

* The fundamentals of Full-Text Search (tsvector, tsquery).

* The basic concept of vector embeddings and similarity search (e.g., cosine distance).

* You have PostgreSQL 15+ with the pg_vector extension installed.


1. Schema Design for Hybrid Data

Our foundation is a well-designed table that accommodates both textual content and its vector representation. Let's define a documents table to store articles or product descriptions.

sql
-- Ensure the pg_vector extension is enabled
CREATE EXTENSION IF NOT EXISTS vector;

-- Main table for our documents
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    published_at TIMESTAMPTZ DEFAULT NOW(),
    -- Column for FTS
    content_tsv TSVECTOR,
    -- Column for vector embeddings (assuming a 768-dim model)
    embedding VECTOR(768)
);

Two columns are critical here:

  • content_tsv: This will store the tsvector representation of our content, pre-processed for FTS. Storing it as a column is far more performant than generating it at query time.
  • embedding: This stores the dense vector from a model like sentence-transformers/all-mpnet-base-v2 (which has 768 dimensions).
  • Automating the `tsvector` with a Trigger

    To ensure content_tsv is always up-to-date and consistent, we'll use a trigger. This offloads the logic from the application layer and guarantees data integrity.

    sql
    -- Create a function to update the tsvector column
    CREATE OR REPLACE FUNCTION update_content_tsv()
    RETURNS TRIGGER AS $$
    BEGIN
        -- Combine title and content for a more comprehensive search vector
        -- We weight the title higher (A) than the content (B)
        NEW.content_tsv := 
            setweight(to_tsvector('pg_catalog.english', COALESCE(NEW.title, '')), 'A') ||
            setweight(to_tsvector('pg_catalog.english', COALESCE(NEW.content, '')), 'B');
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    -- Create a trigger that calls the function before insert or update
    CREATE TRIGGER tsvector_update
    BEFORE INSERT OR UPDATE ON documents
    FOR EACH ROW EXECUTE FUNCTION update_content_tsv();

    This production pattern ensures that every time a document is created or its title or content changes, the content_tsv is automatically and correctly regenerated.

    Populating with Data and Embeddings

    For a realistic example, we need to populate the embedding column. This is typically done in an application layer. Here's a Python script using sentence-transformers and psycopg2 to generate embeddings and insert data.

    python
    # requirements: sentence-transformers, psycopg2-binary, numpy
    import psycopg2
    from sentence_transformers import SentenceTransformer
    import numpy as np
    
    # Sample data
    documents_data = [
        {"title": "Optimizing PostgreSQL Performance", "content": "Partial indexes and connection pooling are key strategies for scaling your database."}, 
        {"title": "Introduction to Vector Search", "content": "Vector search uses embeddings to find semantically similar items. HNSW is a popular indexing algorithm."}, 
        {"title": "Advanced Git Techniques", "content": "Learn about rebasing, cherry-picking, and managing complex merge conflicts."}, 
        {"title": "Postgres Full-Text Search Deep Dive", "content": "The tsvector and tsquery types are fundamental to lexical search in PostgreSQL. Use GIN indexes for speed."}, 
        {"title": "Scaling Applications with Kubernetes", "content": "Kubernetes provides robust tools for container orchestration, service discovery, and auto-scaling."}
    ]
    
    # Load a pre-trained model
    model = SentenceTransformer('all-mpnet-base-v2')
    
    # Generate embeddings
    contents = [doc['content'] for doc in documents_data]
    embeddings = model.encode(contents)
    
    # Connect to PostgreSQL
    conn = psycopg2.connect("dbname=hybrid_search_db user=user password=pass host=localhost")
    cur = conn.cursor()
    
    # Insert data
    for i, doc in enumerate(documents_data):
        # pg_vector expects a numpy array to be converted to a list or string
        embedding_list = embeddings[i].tolist()
        cur.execute(
            "INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s)",
            (doc['title'], doc['content'], embedding_list)
        )
    
    conn.commit()
    cur.close()
    conn.close()
    
    print("Data inserted successfully.")

    2. Advanced Indexing: The Core of Performance

    A hybrid query will be slow without proper indexing. We need to create two distinct types of indexes: one for FTS and one for vector search.

    FTS Indexing: GIN

    For the tsvector column, a Generalized Inverted Index (GIN) is the standard choice. It's highly optimized for queries that check for the existence of elements within a composite type, which is exactly how FTS works.

    sql
    CREATE INDEX idx_documents_tsv ON documents USING GIN(content_tsv);

    Why not GiST? While GiST can also index tsvector, it's generally slower for lookups but faster to build. For a read-heavy search application, GIN's superior query performance is the correct choice.

    Vector Indexing: IVFFlat vs. HNSW

    pg_vector offers several index types. The two most prominent are IVFFlat and HNSW. Choosing the right one involves a trade-off between index build time, index size, query speed, and recall (accuracy).

    Option A: IVFFlat (Inverted File with Flat Compression)

    IVFFlat works by partitioning your vector space into a number of lists (clusters). At query time, it only searches a subset of these lists (probes) nearest to the query vector, making it much faster than a full scan.

    * Pros: Lower memory usage than HNSW, faster to build.

    * Cons: Recall is highly dependent on tuning the lists and probes parameters.

    Implementation:

    sql
    -- The number of lists should be sqrt(N) for up to 1M rows, and N/1000 for larger datasets.
    -- Let's assume we'll have 10,000 documents, so sqrt(10000) = 100 lists.
    CREATE INDEX idx_documents_embedding_ivfflat ON documents USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

    At query time, you must set the ivf.probes parameter. A higher value increases accuracy but slows down the query. A good starting point is sqrt(lists).

    sql
    -- Before running a query using the IVFFlat index
    SET LOCAL ivf.probes = 10;

    Option B: HNSW (Hierarchical Navigable Small World)

    HNSW is a graph-based index that provides excellent performance and high recall. It creates a multi-layered graph where long-distance connections exist on upper layers and shorter ones on lower layers, allowing for efficient traversal.

    * Pros: Excellent recall-performance trade-off. Generally the recommended index for production use cases.

    * Cons: Slower to build and uses more memory than IVFFlat.

    Implementation:

    sql
    -- m: max number of connections per layer (default 16)
    -- ef_construction: size of the dynamic candidate list during build (default 64)
    CREATE INDEX idx_documents_embedding_hnsw ON documents USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

    Similarly, HNSW has a query-time parameter, hnsw.ef_search, that controls the size of the search list. A higher value improves recall at the cost of latency.

    sql
    -- Before running a query using the HNSW index
    SET LOCAL hnsw.ef_search = 40;

    Verdict for Production: Start with HNSW unless you have very strict memory constraints or extremely high write throughput requirements. Its out-of-the-box performance is typically superior.


    3. The Hybrid Query: Fusing Results with Reciprocal Rank Fusion (RRF)

    Now for the core logic. We need to execute two separate searches—one FTS, one vector—and intelligently merge their results. A naive approach of adding or multiplying scores is flawed because the scores are not on a comparable scale. FTS ts_rank scores are arbitrary, and vector distances are geometric.

    Reciprocal Rank Fusion (RRF) provides an elegant solution. It disregards the raw scores and uses only the rank of each result. For each document, its RRF score is the sum of the reciprocals of its rank in each result list.

    Formula: RRF_Score = Σ (1 / (k + rank_i))

    Where rank_i is the document's rank in result set i, and k is a constant to diminish the impact of high ranks (a common value is 60).

    We will implement this using a multi-stage Common Table Expression (CTE) query.

    The Full Query Breakdown

    Let's assume our search input is "scaling postgres database".

  • Prepare Inputs: We need a tsquery for FTS and a vector embedding for the similarity search.
  • python
    # In our application layer
    query_text = "scaling postgres database"
    query_embedding = model.encode(query_text).tolist()
    # Convert to a tsquery format for Postgres
    # 'scaling' & 'postgres' & 'database'
    query_fts = " & ".join(query_text.split())
  • The SQL Query:
  • sql
    -- Set query-time parameters for vector index
    SET LOCAL hnsw.ef_search = 40;
    
    WITH fts_results AS (
        -- CTE 1: Perform the Full-Text Search
        SELECT
            id,
            -- ts_rank_cd is a good choice for ranking
            ts_rank_cd(content_tsv, to_tsquery('pg_catalog.english', 'scaling & postgres & database')) AS fts_score,
            -- Get the rank of each result
            ROW_NUMBER() OVER (ORDER BY ts_rank_cd(content_tsv, to_tsquery('pg_catalog.english', 'scaling & postgres & database')) DESC) as fts_rank
        FROM documents
        -- The @@ operator executes the search
        WHERE content_tsv @@ to_tsquery('pg_catalog.english', 'scaling & postgres & database')
        ORDER BY fts_score DESC
        LIMIT 100 -- Limit the result set from each search to keep it manageable
    ),
    vector_results AS (
        -- CTE 2: Perform the Vector Search
        SELECT 
            id,
            -- The <=> operator is the cosine distance. 1 - distance = similarity.
            1 - (embedding <=> '[...long vector array...]') AS vector_score,
            -- Get the rank of each result
            ROW_NUMBER() OVER (ORDER BY embedding <=> '[...long vector array...]') as vector_rank
        FROM documents
        ORDER BY embedding <=> '[...long vector array...]'
        LIMIT 100
    )
    -- Final SELECT: Fuse the results using RRF
    SELECT
        COALESCE(fts.id, vec.id) as id,
        -- Calculate the RRF score. k=60 is a common constant.
        (COALESCE(1.0 / (60 + fts.fts_rank), 0.0)) +
        (COALESCE(1.0 / (60 + vec.vector_rank), 0.0)) as rrf_score,
        fts.fts_score,
        vec.vector_score
    FROM fts_results fts
    FULL OUTER JOIN vector_results vec ON fts.id = vec.id
    ORDER BY rrf_score DESC
    LIMIT 20;

    Analysis of the Query Structure:

    * fts_results CTE: This performs a standard FTS query. We use ts_rank_cd for scoring and, critically, ROW_NUMBER() to assign a rank to each result. We apply a LIMIT to prevent one search type from dominating with low-quality results.

    * vector_results CTE: This performs the vector similarity search using the cosine distance operator (<=>). Note that lower distance is better, so we ORDER BY it ascending. Again, we use ROW_NUMBER() to get the rank.

    * FULL OUTER JOIN: This is the key to merging. It ensures that documents found in one result set but not the other are still included in the final fusion.

    * RRF Calculation: The rrf_score is calculated using the formula. COALESCE is used to handle documents that are only in one result set (their rank in the other is treated as null, so their contribution to the score from that set is 0).

    * Final ORDER BY and LIMIT: We order by our new rrf_score to get the final, fused ranking and apply a final limit for pagination.

    This structure is robust and correctly handles the fusion logic, providing a much more balanced and relevant result set than either search could alone.


    4. Production Patterns and Edge Cases

    Writing the query is only half the battle. Integrating it into a production system requires handling several edge cases.

    Encapsulating Logic in a Function

    This complex query is unwieldy to build in application code. Encapsulating it in a PostgreSQL function is a clean, maintainable, and performant pattern.

    sql
    CREATE OR REPLACE FUNCTION hybrid_search(
        p_query_text TEXT,
        p_query_embedding VECTOR(768),
        p_match_limit INT DEFAULT 100,
        p_result_limit INT DEFAULT 20
    )
    RETURNS TABLE (id BIGINT, rrf_score NUMERIC, fts_score REAL, vector_score REAL) AS $$
    DECLARE
        v_fts_query_string TEXT;
    BEGIN
        -- Sanitize and format the text for to_tsquery
        v_fts_query_string := array_to_string(string_to_array(p_query_text, ' '), ' & ');
    
        RETURN QUERY
        WITH fts_results AS (
            -- ... FTS CTE from above, using v_fts_query_string ...
        ),
        vector_results AS (
            -- ... Vector CTE from above, using p_query_embedding ...
        )
        -- ... Final fusion SELECT from above ...
        ;
    END;
    $$ LANGUAGE plpgsql;
    
    -- Example Call:
    -- SELECT * FROM hybrid_search('scaling postgres database', '[...vector...]'::vector);

    Weighted Fusion

    Sometimes, you may want to give more importance to either semantic or lexical search. We can modify the RRF calculation to include weights.

    sql
    -- Inside the final SELECT
    SELECT
        COALESCE(fts.id, vec.id) as id,
        -- Weighted RRF. Weights should sum to 1.
        (0.4 * COALESCE(1.0 / (60 + fts.fts_rank), 0.0)) + -- 40% weight to FTS
        (0.6 * COALESCE(1.0 / (60 + vec.vector_rank), 0.0)) as rrf_score, -- 60% weight to vector
        ...

    This allows you to tune relevance based on the nature of your data or even user preferences. For a product search, you might weight FTS higher to prioritize exact SKU matches. For a documentation search, vector might get a higher weight.

    Handling Pagination Correctly

    A common mistake is to apply OFFSET and LIMIT to the individual CTEs. This is incorrect and will lead to skewed results on subsequent pages. Pagination must be applied only to the final, ordered result set.

    The function approach handles this naturally. The application would call the function and then apply LIMIT and OFFSET to the results of the function call.

    sql
    -- Fetching page 2, with 20 items per page
    SELECT * 
    FROM hybrid_search('...', '[...]'::vector)
    ORDER BY rrf_score DESC
    LIMIT 20 OFFSET 20;

    Pre-filtering with Metadata

    In real-world scenarios, search is often constrained by other filters (e.g., WHERE category = 'databases' or WHERE published_at > '2023-01-01'). These WHERE clauses should be applied inside both the fts_results and vector_results CTEs. Applying them only at the end would be inefficient, as the fusion logic would be operating on irrelevant documents.

    sql
    -- ... inside fts_results CTE
    FROM documents
    WHERE content_tsv @@ to_tsquery(...) AND published_at > '2023-01-01'
    
    -- ... inside vector_results CTE
    FROM documents
    WHERE published_at > '2023-01-01'
    ORDER BY embedding <=> ...

    PostgreSQL's planner is generally smart enough to apply the filter before the more expensive search operation, especially if there's an index on the filtered column (published_at in this case).


    5. Performance Analysis and Tuning

    Let's analyze the performance of our hybrid query using EXPLAIN ANALYZE.

    sql
    EXPLAIN (ANALYZE, BUFFERS)
    SELECT * FROM hybrid_search('...', '[...]'::vector);

    The output will be verbose, but you should look for these key indicators:

  • Index Scans: Both CTEs should show an Index Scan or Bitmap Heap Scan using their respective indexes (idx_documents_tsv and idx_documents_embedding_hnsw). If you see a Seq Scan, your indexes are not being used, and performance will be terrible.
  • Execution Time: The total execution time should be in the low milliseconds for a well-tuned system with a moderately sized dataset (e.g., < 100ms for 1M documents).
  • Buffer Hits: A high number of buffer hits indicates that your indexes and data are being served from memory, which is ideal. If you see a lot of block reads, it means Postgres is going to disk, and you may need more RAM.
  • Tuning `hnsw.ef_search`

    The hnsw.ef_search parameter is your primary knob for tuning the vector search trade-off. Let's benchmark its effect:

    * SET LOCAL hnsw.ef_search = 10; -> Latency: 25ms, Recall: 95%

    * SET LOCAL hnsw.ef_search = 40; -> Latency: 50ms, Recall: 98.5%

    * SET LOCAL hnsw.ef_search = 100; -> Latency: 110ms, Recall: 99.2%

    (Note: These are illustrative numbers.)

    For most applications, a recall of 98-99% is more than sufficient. You can find the optimal value by running recall benchmarks against a ground-truth dataset. Start with a value slightly higher than the m parameter of your HNSW index and tune from there.

    Conclusion: An Integrated, Powerful Search Solution

    By combining pg_vector and built-in Full-Text Search within PostgreSQL, we have engineered a sophisticated hybrid search system that overcomes the limitations of each individual technology. The use of CTEs for separation of concerns and Reciprocal Rank Fusion for intelligent, score-agnostic merging provides a robust and production-ready architecture.

    This in-database approach offers significant advantages over external search engines like Elasticsearch, primarily by reducing operational complexity, eliminating data synchronization issues, and allowing search logic to exist within the same transactional context as your primary data.

    For senior engineers tasked with building next-generation search features, mastering this pattern is not just an exercise in advanced SQL; it is a strategic approach to delivering superior relevance and a better user experience, all within the familiar, powerful, and extensible ecosystem of PostgreSQL.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles