Advanced RAG: Graph Indexing for Complex Document Hierarchies

September 28, 2025

20 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Contextual Void: Why Flat Vector Search Fails in Complex RAG

Retrieval-Augmented Generation (RAG) has become a cornerstone of modern LLM applications. The standard pipeline—chunk, embed, store, retrieve, augment—is powerful for question-answering over a corpus of independent documents. However, this paradigm reveals significant weaknesses when applied to real-world, interconnected knowledge bases like corporate wikis, software documentation, legal case files, or research papers. These documents are not islands; they exist in a rich web of explicit and implicit relationships.

A standard vector search RAG system, when asked, "How does the AuthService's rate-limiting policy affect the BillingAPI?", might retrieve a chunk from the AuthService documentation mentioning rate-limiting and another from the BillingAPI documentation. It then forces the LLM to bridge the contextual gap, often leading to hallucinations or incomplete answers because the explicit link between the two services (e.g., an API call dependency defined in an OpenAPI spec) was never retrieved.

This is the contextual void. Flat vector search treats every chunk as an independent, semantically-related peer. It's blind to:

* Hierarchy: A code comment is part of a function, which is in a file, which belongs to a module. This hierarchy is critical context.

* Explicit References: Document A explicitly cites Document B.

* Shared Entities: Two seemingly unrelated documents might both discuss a specific internal project, Project Titan, creating a critical thematic link.

* Causality and Dependency: A system design document outlines a dependency that is later implemented in a specific code module.

To build truly intelligent RAG systems, we must move beyond semantic similarity and model the structure of knowledge. This is where Graph-RAG comes in. By representing our corpus as a knowledge graph, we can perform sophisticated, multi-hop retrievals that capture both semantic similarity and explicit relationships, providing the LLM with a far richer, more accurate context.

This article details a production-ready approach to building a Graph-RAG system using Neo4j for graph storage, LLMs for intelligent entity/relationship extraction, and a hybrid retrieval strategy that combines the best of vector search and graph traversal.

The Graph-RAG Paradigm: Modeling Knowledge as a Connected Web

Instead of a flat list of vectors, we model our corpus as a labeled property graph. This structure consists of nodes (entities) and relationships (edges) that connect them.

Our Core Node Types:

* Document: Represents a whole document (e.g., a Markdown file, a PDF, a web page). Properties: id, source_url, title.

* Chunk: A text segment from a Document. This is the node that will hold the vector embedding. Properties: id, text, embedding, start_char, end_char.

* Entity: A named entity extracted from the text (e.g., a person, a software component, a legal term). Properties: id, name, type.

Our Core Relationship Types:

* HAS_CHUNK: Connects a Document to its constituent Chunk nodes.

* MENTIONS: Connects a Chunk to an Entity it discusses.

* REFERENCES: Connects one Document to another (e.g., via a hyperlink or citation).

* PART_OF: Creates hierarchical relationships (e.g., a Chunk representing a function is PART_OF a Chunk representing a file).

This model allows us to answer complex queries. The question about AuthService and BillingAPI is no longer a simple vector search. It becomes a graph traversal problem: "Find chunks mentioning AuthService and rate-limiting, then explore their connected nodes to see if any paths lead to nodes related to BillingAPI."

Implementation Deep Dive: Building the Knowledge Graph

Let's build this system. Our stack will be Python, the neo4j driver, OpenAI's API for extraction and generation, and a sentence-transformer model for embeddings.

Step 1: Advanced Entity and Relationship Extraction with an LLM

This is the most critical step and where many projects fail. Simply extracting named entities is not enough. We need to extract relationships between them. We'll use an LLM with a carefully engineered prompt to act as a zero-shot information extractor.

The Prompt Engineering:

The key is to constrain the LLM's output to a predictable JSON schema. We provide the schema and instruct the model to populate it based on the text.

python

import openai
import json

# Ensure you have your OPENAI_API_KEY set in your environment

EXTRACTION_SYSTEM_PROMPT = """
You are an expert data analyst. Your task is to extract entities and their relationships from the provided text.

Extract the following information:
- Documents: The main subjects of the text.
- Entities: Key concepts, components, persons, or technologies mentioned.
- Relationships: Connections between the extracted documents and entities.

Respond ONLY with a valid JSON object in the following schema:
{
  "documents": [{"id": "<document_name>", "type": "<document_type>"}],
  "entities": [{"id": "<entity_name>", "type": "<entity_type>"}],
  "relationships": [{"source": "<source_id>", "target": "<target_id>", "type": "<relationship_type>"}]
}

Possible entity types: 'Service', 'API', 'Policy', 'Project', 'Person'.
Possible relationship types: 'USES', 'AFFECTS', 'DEFINES', 'PART_OF', 'MENTIONS'.

Analyze the text and populate the schema. Ensure all `source` and `target` IDs in relationships match an ID from the documents or entities lists.
"""

def extract_graph_from_document(text: str) -> dict:
    """Uses an LLM to extract a graph structure from a text document."""
    try:
        response = openai.chat.completions.create(
            model="gpt-4-1106-preview", # Or your preferred model
            response_format={"type": "json_object"},
            messages=[
                {"role": "system", "content": EXTRACTION_SYSTEM_PROMPT},
                {"role": "user", "content": f"Here is the text to analyze:\n\n{text}"}
            ]
        )
        return json.loads(response.choices[0].message.content)
    except Exception as e:
        print(f"Error during graph extraction: {e}")
        return {"documents": [], "entities": [], "relationships": []}

# Example Usage
document_text = """
Title: AuthService Architecture

The AuthService is a core component of Project Titan. It defines the primary rate-limiting policy for all internal traffic. This policy directly affects the BillingAPI, which uses the AuthService for user authentication. The main logic is implemented in `auth.py`.
"""

graph_data = extract_graph_from_document(document_text)
print(json.dumps(graph_data, indent=2))

This LLM-based approach is powerful but has production implications. It's slower and more expensive than traditional NLP methods. For large-scale ingestion, this should be run as an asynchronous batch process. You might also consider fine-tuning a smaller, open-source model for this specific JSON extraction task to reduce costs and improve latency.

Step 2: Intelligent Chunking and Embedding

Simple fixed-size chunking is suboptimal. A chunk might end mid-sentence, destroying its semantic meaning. We'll use a Document -> Chunk hierarchy.

First, we'll create a Document node. Then, we'll chunk the text. A simple strategy is paragraph-based chunking. For code, it could be function-based. The key is that each Chunk node will be linked to its parent Document.

python

from sentence_transformers import SentenceTransformer

# Use a high-quality, pre-trained model
embedding_model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')

def create_document_chunks(doc_id: str, doc_text: str, chunk_size=512, overlap=50):
    # A more robust implementation would use semantic chunking or paragraph splitting
    chunks = []
    for i in range(0, len(doc_text), chunk_size - overlap):
        chunk_text = doc_text[i:i + chunk_size]
        chunks.append({
            "document_id": doc_id,
            "text": chunk_text,
            "embedding": embedding_model.encode(chunk_text).tolist()
        })
    return chunks

Step 3: Populating the Neo4j Knowledge Graph

With our extracted graph data and chunks, we can now populate Neo4j. We will use idempotent MERGE queries to avoid creating duplicate nodes.

python

from neo4j import GraphDatabase

class KnowledgeGraph:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def ingest_data(self, graph_data: dict, chunks: list):
        with self.driver.session() as session:
            session.execute_write(self._ingest_graph, graph_data, chunks)

    @staticmethod
    def _ingest_graph(tx, graph_data, chunks):
        # Ingest Documents and Entities
        for doc in graph_data.get('documents', []):
            tx.run("MERGE (d:Document {id: $id}) SET d.type = $type", id=doc['id'], type=doc.get('type'))
        
        for entity in graph_data.get('entities', []):
            tx.run("MERGE (e:Entity {id: $id}) SET e.type = $type", id=entity['id'], type=entity.get('type'))

        # Ingest Chunks and link to parent Document
        for i, chunk in enumerate(chunks):
            chunk_id = f"{chunk['document_id']}-chunk-{i}"
            tx.run("""
                MATCH (d:Document {id: $doc_id})
                MERGE (c:Chunk {id: $chunk_id})
                SET c.text = $text, c.embedding = $embedding
                MERGE (d)-[:HAS_CHUNK]->(c)
            """, doc_id=chunk['document_id'], chunk_id=chunk_id, text=chunk['text'], embedding=chunk['embedding'])

        # Ingest Relationships
        for rel in graph_data.get('relationships', []):
            tx.run("""
                MATCH (source) WHERE source.id = $source_id
                MATCH (target) WHERE target.id = $target_id
                MERGE (source)-[r:%s]->(target)
            """ % rel['type'], source_id=rel['source'], target_id=rel['target'])

# Usage:
# kg = KnowledgeGraph("neo4j://localhost:7687", "neo4j", "password")
# # Assuming 'graph_data' from LLM and 'chunks' from chunking process
# document_id = graph_data['documents'][0]['id']
# chunks = create_document_chunks(document_id, document_text)
# kg.ingest_data(graph_data, chunks)
# kg.close()

Step 4: Creating a Vector Index in Neo4j

To combine graph traversal with vector search, we need a vector index on our Chunk nodes. This is a crucial step for our hybrid retrieval strategy.

Execute this Cypher query directly in Neo4j Browser or via the driver:

cypher

CREATE VECTOR INDEX `chunk_embeddings` IF NOT EXISTS
FOR (c:Chunk)
ON (c.embedding)
OPTIONS {indexConfig: {
  `vector.dimensions`: 768,       // Must match your embedding model's dimensions
  `vector.similarity_function`: 'cosine'
}}

This command tells Neo4j to build and maintain an efficient index for performing similarity searches on the embedding property of all Chunk nodes.

Advanced Retrieval: Multi-Hop Hybrid Queries

This is where the Graph-RAG approach truly shines. Our retrieval process is no longer a single API call but a multi-step workflow.

Initial Candidate Retrieval (Vector Search): Find the Chunk nodes most semantically similar to the user's query.

Contextual Expansion (Graph Traversal): From these initial nodes, traverse the graph to find related documents, entities, and other relevant chunks.

Context Synthesis: Collate the retrieved information and format it for the LLM prompt.

Here’s how to implement it:

python

class GraphRAGQueryEngine:
    def __init__(self, driver, embedding_model):
        self.driver = driver
        self.embedding_model = embedding_model

    def query(self, query_text: str):
        query_vector = self.embedding_model.encode(query_text).tolist()

        with self.driver.session() as session:
            # Step 1 & 2: Hybrid vector search and graph traversal in one query
            result = session.run("""
                CALL db.index.vector.queryNodes('chunk_embeddings', 5, $query_vector) YIELD node AS similar_chunk, score
                
                // Find the parent document of the similar chunk
                MATCH (d:Document)-[:HAS_CHUNK]->(similar_chunk)
                
                // Expand context: Find other chunks from the same document
                OPTIONAL MATCH (similar_chunk)<-[:HAS_CHUNK]-(d)-[:HAS_CHUNK]->(other_chunk)
                WHERE other_chunk <> similar_chunk
                
                // Expand context: Find entities mentioned in the similar chunk and documents they appear in elsewhere
                OPTIONAL MATCH (similar_chunk)-[:MENTIONS]->(e:Entity)<-[:MENTIONS]-(related_chunk:Chunk)
                WHERE related_chunk <> similar_chunk
                WITH similar_chunk, score, d, 
                     collect(DISTINCT other_chunk.text) AS other_chunks_in_doc, 
                     collect(DISTINCT {entity: e.id, related_chunk: related_chunk.text}) as related_info
                
                RETURN 
                    similar_chunk.text AS text, 
                    score, 
                    d.id AS document_id, 
                    other_chunks_in_doc, 
                    related_info
                ORDER BY score DESC
            """, query_vector=query_vector)

            context = self._synthesize_context(result)
            
            # Step 3: Augment and Generate
            final_answer = self._generate_response(query_text, context)
            return final_answer

    def _synthesize_context(self, result) -> str:
        context_str = ""
        for record in result:
            context_str += f"\n---\nSource Document: {record['document_id']} (Similarity Score: {record['score']:.4f})\n"
            context_str += f"Retrieved Chunk: {record['text']}\n"
            
            if record['other_chunks_in_doc']:
                context_str += "\nOther relevant chunks from the same document:\n"
                for text in record['other_chunks_in_doc'][:2]: # Limit for brevity
                    context_str += f"- {text}\n"

            if record['related_info']:
                context_str += "\nRelated information from other documents via shared entities:\n"
                for info in record['related_info'][:2]:
                    if info['entity'] and info['related_chunk']:
                        context_str += f"- Entity '{info['entity']}' is also mentioned in: '{info['related_chunk']}'\n"
        return context_str

    def _generate_response(self, query, context):
        prompt = f"""
        You are a helpful AI assistant. Answer the user's question based on the following context retrieved from a knowledge graph.
        Be concise and precise. If the context does not contain the answer, say so.
        
        Context:
        {context}
        
        Question: {query}
        
        Answer:
        """
        
        response = openai.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "You are an expert Q&A system."}, 
                {"role": "user", "content": prompt}
            ]
        )
        return response.choices[0].message.content

# Example Usage:
# driver = GraphDatabase.driver("neo4j://localhost:7687", auth=("neo4j", "password"))
# model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
# engine = GraphRAGQueryEngine(driver, model)
# answer = engine.query("How does the AuthService rate-limiting affect the BillingAPI?")
# print(answer)
# driver.close()

The Cypher query above is the heart of our retrieval. It:

CALL db.index.vector.queryNodes: Performs the initial vector search to get the top 5 most similar chunks.

MATCH (d:Document)-[:HAS_CHUNK]->(similar_chunk): Finds the parent document for each retrieved chunk.

OPTIONAL MATCH ... (other_chunk): Traverses back to the document and then out to its other chunks, gathering local context.

OPTIONAL MATCH ... (related_chunk): Performs a multi-hop traversal. It finds entities mentioned in the primary chunk, then finds other chunks anywhere in the graph that also mention those same entities. This is how we bridge context across disparate documents.

collect(DISTINCT ...): Aggregates the expanded context for each initial hit.

This single, powerful query provides a rich, structured context far superior to a simple list of semantically similar chunks.

Production Considerations, Edge Cases, and Performance

Deploying a Graph-RAG system requires careful planning.

1. Scalability of Ingestion:

* Problem: The LLM-based extraction is a bottleneck. Running it on millions of documents is slow and expensive.

* Solution:

* Batch Processing: Use a job queue (e.g., Celery, RabbitMQ) to process documents asynchronously.

* Fine-tuning: Fine-tune a smaller, open-source model (like a member of the Llama or Mistral families) on a high-quality dataset of (text, graph_json) pairs generated by a more powerful model like GPT-4. This drastically reduces per-document cost and latency.

* Parallelization: Use tools like Ray or Spark to distribute the extraction and embedding process across multiple workers.

2. Graph Maintenance and Updates:

* Problem: Documents get updated or deleted. How do you keep the graph consistent?

* Solution:

* Versioning: Add a version or last_updated property to Document nodes. Your ingestion pipeline should check this before processing.

* Stale Component Deletion: For updates, detach and delete old chunks and relationships associated with a Document before ingesting the new version. This can be done in a single transaction to maintain consistency.

* TTL (Time-To-Live): For transient data, consider using a TTL index in Neo4j to automatically expire old nodes.

3. Query Performance Optimization:

* Problem: Complex Cypher queries can be slow on large graphs.

* Solution:

* Profiling: Use PROFILE or EXPLAIN in front of your Cypher queries to understand their execution plan. Look for full graph scans (NodeByLabelScan) and high database hits.

* Schema Indexes: Create traditional indexes on node properties used in MATCH clauses, like id and type. CREATE INDEX ON :Document(id); and CREATE INDEX ON :Entity(id); are essential.

* Query Parameterization: Always use parameters (like $query_vector) instead of string formatting. This allows Neo4j to cache the query plan for much faster subsequent executions.

Limiting Traversal Depth: In highly connected graphs, unbound traversals can explode. Use variable-length path limits, e.g., MATCH (a)-[1..3]-(b) to limit traversals to 3 hops.

4. Handling Low-Quality Initial Retrieval:

* Problem: What if the initial vector search returns irrelevant chunks? The subsequent graph traversal will explore the wrong part of the graph.

* Solution:

* Hybrid Search Fallback: Combine vector search with keyword-based full-text search. Neo4j supports full-text indexes. You can run both searches and combine the results.

* Re-ranking: Retrieve more initial candidates than you need (e.g., top 20) and then use a more sophisticated re-ranking model (like a cross-encoder) or business logic to select the best starting points for graph traversal.

* Query Expansion: Use an LLM to rewrite the user's query into several variations or to extract key entities from the query itself, which can then be used to directly look up nodes in the graph, bypassing the initial vector search entirely for certain query types.

Conclusion: The Next Frontier of RAG

By moving from flat vector stores to structured knowledge graphs, we fundamentally upgrade the capabilities of our RAG systems. This Graph-RAG approach transforms retrieval from a simple similarity search into a sophisticated reasoning process over the relationships inherent in our data.

The implementation is more complex than a standard RAG pipeline, but the payoff is a system that can answer nuanced, multi-part questions that are impossible for vector-only systems. It provides more accurate, explainable, and contextually-aware responses, pushing the boundaries of what's possible with LLM-powered applications. The future of RAG is not just about finding similar text; it's about understanding the connections between them.

The Contextual Void: Why Flat Vector Search Fails in Complex RAG

The Graph-RAG Paradigm: Modeling Knowledge as a Connected Web

Implementation Deep Dive: Building the Knowledge Graph

Step 1: Advanced Entity and Relationship Extraction with an LLM

Step 2: Intelligent Chunking and Embedding

Step 3: Populating the Neo4j Knowledge Graph

Step 4: Creating a Vector Index in Neo4j

Advanced Retrieval: Multi-Hop Hybrid Queries

Production Considerations, Edge Cases, and Performance

Conclusion: The Next Frontier of RAG

Found this article helpful?