Hybrid Search in Postgres: RRF for pgvector & FTS

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Dichotomy of Modern Search: Semantic vs. Lexical

In contemporary search system architecture, particularly for Retrieval-Augmented Generation (RAG), we face a fundamental conflict between two retrieval paradigms: semantic search and lexical (keyword) search.

Vector Search (e.g., pgvector) excels at capturing semantic similarity. A query for "machine learning frameworks" can correctly identify documents discussing TensorFlow, PyTorch, or scikit-learn, even if those exact keywords aren't present. This is its strength and its weakness. It operates in a high-dimensional latent space where proximity implies meaning, but this abstraction can lead to a loss of precision for specific, literal terms. A search for a specific function name like torch.nn.functional.relu or a product SKU HN-75-B might be misinterpreted semantically, returning results about neural network activation functions or general hardware components instead of the exact match.

Full-Text Search (FTS), a mature technology built into PostgreSQL, is the inverse. It is ruthlessly literal. It operates on lexemes, stems, and stop words, providing exceptional precision for keyword-based queries. A search for HN-75-B using FTS will find documents containing that exact string. However, it possesses no understanding of intent. A query for "AI model tools" would miss a document that only mentions "PyTorch" because the semantic link is absent.

Fusing these two modalities is the objective of hybrid search. However, the naive approach—normalizing scores from both systems and adding them—is fraught with peril. The ts_rank score from FTS and the cosine distance from pgvector operate on entirely different scales and distributions. Normalizing them with Min-Max scaling or Z-score standardization is brittle; a change in the document corpus can shift the distribution, invalidating the normalization constants and biasing the results heavily toward one search type. This is not a production-viable strategy.

Reciprocal Rank Fusion (RRF): A Score-Agnostic Solution

Reciprocal Rank Fusion (RRF) provides an elegant and robust solution to this problem by completely ignoring the raw scores. It operates solely on the rank of a document in each result set. This makes it immune to the scaling and distribution issues of the underlying search algorithms.

The formula is simple yet powerful:

text
RRF_Score(d) = Σ (1 / (k + rank_i(d)))

Where:

  • d is a specific document.
  • rank_i(d) is the rank of document d in result set i (from FTS, vector search, etc.).
  • k is a constant used to tune the influence of lower-ranked documents. A smaller k gives more weight to top-ranked results. A common starting value is k=60.
  • Consider two result sets, one from vector search and one from FTS:

  • Vector Search Results: [doc_A, doc_C, doc_B]
  • FTS Results: [doc_B, doc_A, doc_D]
  • Let's calculate the RRF score for each document with k=60:

  • RRF(doc_A) = (1 / (60 + 1)) [from vector] + (1 / (60 + 2)) [from FTS] ≈ 0.01639 + 0.01613 = 0.03252
  • RRF(doc_B) = (1 / (60 + 3)) [from vector] + (1 / (60 + 1)) [from FTS] ≈ 0.01587 + 0.01639 = 0.03226
  • RRF(doc_C) = (1 / (60 + 2)) [from vector] + (1 / (60 + ∞)) [not in FTS] ≈ 0.01613 + 0 = 0.01613
  • RRF(doc_D) = (1 / (60 + ∞)) [not in vector] + (1 / (60 + 3)) [from FTS] ≈ 0 + 0.01587 = 0.01587
  • Final Fused Rank: [doc_A, doc_B, doc_C, doc_D]

    This demonstrates how RRF elegantly combines the signals. doc_A, which appeared high in both lists, gets the top spot. doc_B, also present in both, is a close second. doc_C and doc_D, which were only present in one list, are ranked lower. The algorithm naturally rewards documents that are considered relevant by multiple retrieval systems.

    Production Implementation Architecture in PostgreSQL

    Our goal is to implement this entire workflow within a Node.js/TypeScript application layer interacting with a PostgreSQL database.

    1. Table Schema and Indexing

    First, we need a table that accommodates both FTS and vector data. We'll also need a trigger to keep the tsvector column synchronized.

    sql
    -- Ensure the necessary extensions are enabled
    CREATE EXTENSION IF NOT EXISTS vector;
    CREATE EXTENSION IF NOT EXISTS pg_trgm;
    
    -- The main table for our documents
    CREATE TABLE documents (
        id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
        content TEXT NOT NULL,
        embedding VECTOR(1536), -- Example dimension for OpenAI embeddings
        content_tsv TSVECTOR,
        created_at TIMESTAMPTZ DEFAULT NOW()
    );
    
    -- Create an HNSW index for vector search. HNSW is generally preferred for its
    -- balance of speed and recall over IVFFlat for many use cases.
    -- m: max number of connections per layer (higher means better recall, slower build)
    -- ef_construction: size of the dynamic list for neighbors during build (higher means better index quality)
    CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
    
    -- Create a GIN index for FTS. GIN is ideal for tsvector columns.
    CREATE INDEX ON documents USING GIN (content_tsv);
    
    -- A function to update the tsvector column automatically
    CREATE OR REPLACE FUNCTION update_content_tsv()
    RETURNS TRIGGER AS $$
    BEGIN
        NEW.content_tsv := to_tsvector('english', NEW.content);
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    -- A trigger to execute the function on insert or update
    CREATE TRIGGER tsvector_update
    BEFORE INSERT OR UPDATE ON documents
    FOR EACH ROW EXECUTE FUNCTION update_content_tsv();

    Architectural Choice: HNSW vs. IVFFlat

  • IVFFlat: Partitions vectors into lists (n_lists). At query time, it only searches a subset of these lists (probes). It's faster to build but requires careful tuning of n_lists and probes to balance speed and recall. It's often a good choice for static datasets.
  • HNSW (Hierarchical Navigable Small World): Creates a multi-layered graph structure. It offers superior query performance and recall, especially in high-dimensional spaces, at the cost of a slower, more memory-intensive build process. For dynamic applications where query performance is paramount, HNSW is typically the superior choice. The ef_search parameter, set at query time, allows for dynamic trade-offs between speed and accuracy.
  • 2. The Application Layer: A TypeScript Implementation

    We'll build a service in TypeScript to orchestrate the hybrid search process. This involves executing queries in parallel, fusing the results, and then hydrating the final data.

    We'll use the pg library for database interaction.

    typescript
    // src/database.ts
    import { Pool } from 'pg';
    
    export const pool = new Pool({
      connectionString: process.env.DATABASE_URL,
    });
    
    // src/embedding-client.ts - A mock embedding client
    // In a real application, this would call an API like OpenAI, Cohere, etc.
    export class EmbeddingClient {
      async createEmbedding(text: string): Promise<number[]> {
        console.log(`Generating embedding for: "${text.substring(0, 30)}..."`);
        // In a real implementation, this would be a network call.
        // Here, we generate a normalized random vector for demonstration.
        const vector = Array.from({ length: 1536 }, () => Math.random() * 2 - 1);
        const magnitude = Math.sqrt(vector.reduce((sum, val) => sum + val * val, 0));
        return vector.map(v => v / magnitude);
      }
    }
    
    // src/hybrid-search-service.ts
    import { pool } from './database';
    import { EmbeddingClient } from './embedding-client';
    
    // Type definitions for clarity
    interface SearchResult {
      id: string;
      score: number; // The original score from the search engine
    }
    
    interface FusedResult {
      id: string;
      rrfScore: number;
    }
    
    interface Document {
      id: string;
      content: string;
      createdAt: Date;
    }
    
    export class HybridSearchService {
      private embeddingClient: EmbeddingClient;
      private readonly RRF_K = 60; // The 'k' constant for RRF
    
      constructor() {
        this.embeddingClient = new EmbeddingClient();
      }
    
      private async vectorSearch(queryVector: number[], limit: number): Promise<SearchResult[]> {
        const vectorString = `[${queryVector.join(',')}]`;
        const query = {
          text: `
            SELECT id, 1 - (embedding <=> $1) AS score
            FROM documents
            ORDER BY embedding <=> $1
            LIMIT $2;
          `,
          values: [vectorString, limit],
        };
        
        // pgvector HNSW requires setting ef_search for performance/accuracy tuning
        // This should be done per-transaction or per-session
        const client = await pool.connect();
        try {
          await client.query('SET LOCAL hnsw.ef_search = 100;'); // Tune this value
          const res = await client.query(query);
          return res.rows.map(row => ({ id: row.id, score: parseFloat(row.score) }));
        } finally {
          client.release();
        }
      }
    
      private async fullTextSearch(query: string, limit: number): Promise<SearchResult[]> {
        const queryText = {
          text: `
            SELECT id, ts_rank_cd(content_tsv, websearch_to_tsquery('english', $1)) AS score
            FROM documents
            WHERE content_tsv @@ websearch_to_tsquery('english', $1)
            ORDER BY score DESC
            LIMIT $2;
          `,
          values: [query, limit],
        };
        const res = await pool.query(queryText);
        return res.rows.map(row => ({ id: row.id, score: parseFloat(row.score) }));
      }
    
      private reciprocalRankFusion(results: SearchResult[][]): FusedResult[] {
        const rankedResults: { [docId: string]: number } = {};
    
        for (const resultSet of results) {
          for (let i = 0; i < resultSet.length; i++) {
            const docId = resultSet[i].id;
            const rank = i + 1;
            const score = 1 / (this.RRF_K + rank);
    
            if (!rankedResults[docId]) {
              rankedResults[docId] = 0;
            }
            rankedResults[docId] += score;
          }
        }
    
        const fused = Object.entries(rankedResults).map(([id, rrfScore]) => ({ id, rrfScore }));
        return fused.sort((a, b) => b.rrfScore - a.rrfScore);
      }
    
      private async hydrateDocuments(fusedResults: FusedResult[], limit: number): Promise<Document[]> {
        if (fusedResults.length === 0) {
          return [];
        }
    
        const topResults = fusedResults.slice(0, limit);
        const docIds = topResults.map(r => r.id);
    
        // We need to fetch the documents and preserve the RRF order.
        // A common pattern is using a JOIN with a VALUES list or a CASE statement.
        const query = {
          text: `
            SELECT id, content, created_at
            FROM documents
            JOIN unnest($1::uuid[]) WITH ORDINALITY t(id, ord) USING (id)
            ORDER BY t.ord;
          `,
          values: [docIds],
        };
    
        const res = await pool.query(query);
        return res.rows.map(row => ({
          id: row.id,
          content: row.content,
          createdAt: new Date(row.created_at),
        }));
      }
    
      public async search(query: string, limit: number = 10): Promise<Document[]> {
        // For production, fetch more results than needed for better fusion quality.
        // This is a critical tuning parameter.
        const searchLimit = limit * 5;
    
        // 1. Generate query embedding
        const queryVector = await this.embeddingClient.createEmbedding(query);
    
        // 2. Execute searches in parallel for performance
        const [vectorResults, ftsResults] = await Promise.all([
          this.vectorSearch(queryVector, searchLimit),
          this.fullTextSearch(query, searchLimit),
        ]);
    
        // 3. Fuse the results using RRF
        const fusedResults = this.reciprocalRankFusion([vectorResults, ftsResults]);
    
        // 4. Hydrate the final, ordered list of documents
        const finalDocuments = await this.hydrateDocuments(fusedResults, limit);
    
        return finalDocuments;
      }
    }
    
    // Example usage
    async function main() {
      // Assume you have populated the 'documents' table with some data and embeddings
      const searchService = new HybridSearchService();
      const query = "advanced strategies for database indexing in postgres";
    
      console.log(`Executing hybrid search for: "${query}"`);
      const results = await searchService.search(query, 5);
    
      console.log("\n--- Hybrid Search Results ---");
      if (results.length > 0) {
        results.forEach((doc, index) => {
          console.log(`${index + 1}. [ID: ${doc.id}] ${doc.content.substring(0, 100)}...`);
        });
      } else {
        console.log("No results found.");
      }
    
      // Example of a query where FTS is likely to dominate
      const specificQuery = "PostgreSQL HNSW index with m=16";
      console.log(`\nExecuting hybrid search for: "${specificQuery}"`);
      const specificResults = await searchService.search(specificQuery, 5);
    
      console.log("\n--- Hybrid Search Results (Specific) ---");
      if (specificResults.length > 0) {
        specificResults.forEach((doc, index) => {
          console.log(`${index + 1}. [ID: ${doc.id}] ${doc.content.substring(0, 100)}...`);
        });
      } else {
        console.log("No results found.");
      }
    
      await pool.end();
    }
    
    main().catch(console.error);

    Breakdown of the Implementation

  • Parallel Execution: The vectorSearch and fullTextSearch methods are invoked concurrently using Promise.all. This is a crucial performance optimization, as the total latency will be dictated by the slower of the two queries, not their sum.
  • HNSW Tuning (ef_search): Notice the SET LOCAL hnsw.ef_search = 100; command. This parameter controls the size of the dynamic list used during the HNSW graph traversal. A higher ef_search increases recall (accuracy) at the cost of higher latency. Setting it LOCAL to the transaction ensures it doesn't affect other connections. This is a key tuning knob for production systems.
  • Result Set Oversampling: We fetch limit * 5 results from each underlying search. Why? RRF's effectiveness depends on finding overlaps between result sets. If doc_A is at rank 11 in vector search and rank 12 in FTS, it won't be found if we only fetch the top 10 from each. Oversampling increases the chance of finding these valuable mid-rank agreements, leading to a much higher quality final ranking, at the cost of retrieving more initial data.
  • The Fusion Logic: The reciprocalRankFusion method is a direct implementation of the RRF formula. It iterates through each result set, calculates the reciprocal rank score for each document, and aggregates these scores in a map. A final sort produces the definitive ranking.
  • Efficient Hydration: The hydrateDocuments method avoids the N+1 query problem. Instead of fetching each document individually, it passes the entire sorted list of IDs to a single SELECT query. The JOIN unnest(...) WITH ORDINALITY is a powerful PostgreSQL pattern that allows us to join against an array of values and, critically, preserve the order defined by that array (ORDER BY t.ord). This is far more efficient than trying to re-order the results in the application layer after a simple WHERE id IN (...) clause, which does not guarantee order.
  • Advanced Considerations and Edge Cases

    Tuning the `k` Constant

    The k constant in RRF controls how much to penalize documents that appear lower in the rankings.

  • A low k (e.g., k=1) makes the score drop off very steeply. 1/(1+1) = 0.5, 1/(1+2) = 0.33, 1/(1+10) = 0.09. This heavily favors documents that are at the very top of any list.
  • A high k (e.g., k=100) creates a much flatter curve. 1/(100+1) = 0.0099, 1/(100+2) = 0.0098. This gives more weight to documents that appear consistently across lists, even if they aren't at the absolute top.
  • A standard starting point is k=60. The optimal value is domain-specific and should be determined empirically using a validation set of queries and relevance-judged documents. You can run experiments by varying k and measuring metrics like NDCG (Normalized Discounted Cumulative Gain) to find the best value for your specific use case.

    Performance Under Load and Indexing Strategy

  • Write Amplification: The trigger on the documents table adds overhead to every INSERT and UPDATE. For write-heavy systems, consider moving this logic to a background job processor (e.g., using a message queue) to decouple it from the primary transaction.
  • Vector Indexing Time: Building an HNSW index is computationally expensive. For very large, frequently updated tables, you may need a strategy where new documents are inserted without being immediately indexed. A background process can then add them to the index in batches during off-peak hours. During this interim period, new documents would only be discoverable via FTS or a full table scan for vector search (which is very slow and should be avoided).
  • Query Latency: The end-to-end latency is max(latency_vector, latency_fts) + latency_fusion + latency_hydration. The fusion step is CPU-bound but extremely fast for reasonable result sizes (e.g., a few thousand documents). The database queries are the main bottleneck. Ensure you have enough maintenance_work_mem for index builds and work_mem for query execution. Use EXPLAIN ANALYZE extensively to verify that both the vector and FTS indexes are being used correctly.
  • Handling Pagination

    Simple LIMIT and OFFSET on the final result set is highly inefficient for deep pagination. To get page 10 (e.g., results 91-100), you would still need to:

    • Fetch a large number of results from both vector and FTS (e.g., 500 each).
    • Fuse all 500+ unique documents.
    • Sort them.
    • Skip the first 90 and take the next 10.

    This is wasteful. A better approach for stateful pagination is keyset pagination.

  • For the first request, perform the search as described. Return the first 10 results along with the rrfScore of the 10th item.
  • For the next page, the client sends back the last seen rrfScore. Your application would then fetch the initial large result sets, perform the fusion, and then filter the fused list in-memory for items with an rrfScore less than or equal to the one provided by the client, skipping any IDs it has already seen.
  • This avoids the database-level OFFSET performance penalty but requires more state management in the application.

    When One Search Returns Nothing

    The RRF implementation gracefully handles cases where one of the search methods returns an empty set. If ftsResults is empty, the fusion logic will simply process the vectorResults and the final ranking will be identical to the pure vector search ranking. This is a key part of its robustness.

    Conclusion

    Hybrid search is not a compromise; it is a superior retrieval paradigm that leverages the distinct strengths of semantic and lexical search. By implementing Reciprocal Rank Fusion on top of PostgreSQL's powerful pgvector and Full-Text Search capabilities, we can build sophisticated, production-ready search systems that are more accurate and resilient than single-modality solutions. The key is to move beyond naive score-based merging and adopt a rank-aware algorithm like RRF. The provided architecture—emphasizing parallel execution, result oversampling, and efficient hydration—serves as a robust template for integrating this advanced search strategy into any high-performance application.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles