GraphRAG: Implementing Neo4j Graph Indexing for Context-Rich QA

October 13, 2025

23 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Contextual Blind Spot of Vector-Based RAG

Retrieval-Augmented Generation (RAG) has become a cornerstone for grounding Large Language Models (LLMs) in factual data. The standard implementation—chunking documents, generating vector embeddings, and performing cosine similarity searches—excels at finding semantically similar text passages. However, for senior engineers building systems on top of complex, interconnected knowledge bases, this approach reveals a critical flaw: it is fundamentally context-blind.

A standard vector database sees a knowledge base as a flat collection of isolated text chunks. It has no inherent understanding of the relationships between these chunks. Consider a query against an internal software engineering knowledge base: "Which microservices, written in Go, will be affected by the upcoming deprecation of the v2 auth-lib?"

A traditional RAG system might retrieve chunks mentioning:

The auth-lib deprecation notice.

A document listing services written in Go.

The README for a service that happens to use auth-lib.

The LLM is then left to piece together these disparate fragments, often failing to establish a definitive causal link. It cannot traverse the dependency graph because, in a vector-only world, that graph doesn't explicitly exist. This is the contextual blind spot we will address.

This article details a production-ready pattern, GraphRAG, that fuses the semantic search capabilities of vectors with the explicit relational power of graph databases. We will use Neo4j to model our knowledge base as a graph of interconnected entities and implement sophisticated retrieval strategies that provide the LLM with a rich, multi-hop contextual understanding, leading to vastly more accurate and insightful answers.

Section 1: Knowledge Graph Construction from Unstructured Data

Our first task is to transform unstructured documents into a structured knowledge graph. The goal is not just to store text but to extract and represent entities (like services, libraries, languages, deprecation notices) and the relationships between them (e.g., DEPENDS_ON, WRITTEN_IN, AFFECTS).

We will use an LLM's function-calling or tool-using capabilities to act as a highly sophisticated information extraction engine. By defining a Pydantic schema for our desired graph structure, we can instruct the model to parse documents and return structured data ready for ingestion into Neo4j.

Defining the Graph Schema

Let's define a schema for our software engineering knowledge base. We'll use Pydantic for clear, type-safe definitions that can be easily passed to an LLM.

python

# model.py
from typing import List, Optional
from pydantic import BaseModel, Field

class Library(BaseModel):
    name: str = Field(..., description="The unique name of the library, e.g., 'auth-lib-v2'")
    language: str = Field(..., description="The programming language of the library, e.g., 'Go'")

class Service(BaseModel):
    name: str = Field(..., description="The unique name of the microservice, e.g., 'user-service'")
    language: str = Field(..., description="The primary programming language of the service, e.g., 'Python'")
    dependencies: List[str] = Field(default_factory=list, description="A list of library names this service depends on.")

class Deprecation(BaseModel):
    library_name: str = Field(..., description="The name of the library being deprecated.")
    description: str = Field(..., description="A detailed description of the deprecation, its reasons, and migration path.")
    affected_services: List[str] = Field(default_factory=list, description="A list of service names known to be affected.")

class KnowledgeGraph(BaseModel):
    """A model for extracting a complete knowledge graph from a document."""
    services: List[Service] = Field(default_factory=list)
    libraries: List[Library] = Field(default_factory=list)
    deprecations: List[Deprecation] = Field(default_factory=list)

The LLM-Powered Extraction Pipeline

Now, we'll create a Python script that uses this schema to instruct an LLM (we'll use OpenAI's API for this example) to extract structured data from a raw text document.

python

# graph_extractor.py
import os
from openai import OpenAI
from pydantic import BaseModel
from typing import Type
import json

from model import KnowledgeGraph # Import our Pydantic model

# Ensure you have OPENAI_API_KEY set in your environment
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

def extract_graph_from_document(document_text: str, model_class: Type[BaseModel]) -> BaseModel:
    """Uses OpenAI's function calling to extract structured data from text."""
    response = client.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[
            {
                "role": "system",
                "content": "You are an expert system designed to extract entities and relationships from software engineering documents into a structured knowledge graph. Extract all available information accurately."
            },
            {
                "role": "user",
                "content": f"Extract the knowledge graph from the following document:\n\n{document_text}"
            }
        ],
        tools=[
            {
                "type": "function",
                "function": {
                    "name": "extract_knowledge_graph",
                    "description": "Extracts a knowledge graph of services, libraries, and their relationships.",
                    "parameters": model_class.model_json_schema()
                }
            }
        ],
        tool_choice={"type": "function", "function": {"name": "extract_knowledge_graph"}}
    )

    tool_calls = response.choices[0].message.tool_calls
    if not tool_calls:
        raise ValueError("The model did not return any tool calls. Check the input document.")

    # In a production scenario, you'd add more robust error handling here
    function_args = json.loads(tool_calls[0].function.arguments)
    return model_class(**function_args)

# Example Usage
sample_document = """
# System Architecture Overview - Q4 2023

## Services
- **User Service**: Written in Python, this service handles all user authentication and profile management. It depends on `auth-lib-v2` and `request-validator`.
- **Billing Service**: A Go-based service for processing payments. It leverages `stripe-sdk` and `auth-lib-v2` for secure access.
- **Notification Service**: A Node.js service for sending emails and push notifications. It has no external library dependencies from our core set.

## Core Libraries
- **auth-lib-v2**: Our internal Go authentication library.
- **request-validator**: A Python library for input validation.
- **stripe-sdk**: A Go wrapper for the Stripe API.

## Deprecation Notices
**ID: D-2023-001**
The `auth-lib-v2` is being deprecated due to security vulnerabilities. All services must migrate to `auth-lib-v3` by Q1 2024. The Billing Service and User Service are known consumers.
"""

if __name__ == "__main__":
    extracted_data = extract_graph_from_document(sample_document, KnowledgeGraph)
    print(extracted_data.model_dump_json(indent=2))

Ingesting into Neo4j

With the structured data extracted, the final step is to write it into Neo4j using Cypher queries. We'll use the neo4j Python driver. The key here is using MERGE to avoid creating duplicate nodes and relationships, making the ingestion process idempotent.

python

# graph_ingestor.py
import os
from neo4j import GraphDatabase
from model import KnowledgeGraph
from graph_extractor import extract_graph_from_document, sample_document

NEO4J_URI = os.environ.get("NEO4J_URI", "bolt://localhost:7687")
NEO4J_USER = os.environ.get("NEO4J_USER", "neo4j")
NEO4J_PASSWORD = os.environ.get("NEO4J_PASSWORD", "password")

class Neo4jIngestor:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def ingest_data(self, graph_data: KnowledgeGraph):
        with self.driver.session() as session:
            # Use a single transaction for atomicity
            session.execute_write(self._create_graph_tx, graph_data)
            print("Ingestion complete.")

    @staticmethod
    def _create_graph_tx(tx, graph_data: KnowledgeGraph):
        # Create constraints for uniqueness (best practice)
        tx.run("CREATE CONSTRAINT service_name IF NOT EXISTS FOR (s:Service) REQUIRE s.name IS UNIQUE")
        tx.run("CREATE CONSTRAINT library_name IF NOT EXISTS FOR (l:Library) REQUIRE l.name IS UNIQUE")

        # Ingest libraries
        for lib in graph_data.libraries:
            tx.run("MERGE (l:Library {name: $name}) SET l.language = $language", 
                     name=lib.name, language=lib.language)

        # Ingest services and dependencies
        for service in graph_data.services:
            tx.run("MERGE (s:Service {name: $name}) SET s.language = $language", 
                     name=service.name, language=service.language)
            for dep_name in service.dependencies:
                tx.run("""
                MATCH (s:Service {name: $service_name})
                MATCH (l:Library {name: $lib_name})
                MERGE (s)-[:DEPENDS_ON]->(l)
                """, service_name=service.name, lib_name=dep_name)

        # Ingest deprecations and link them
        for dep in graph_data.deprecations:
            tx.run("""
            MERGE (d:Deprecation {library_name: $lib_name})
            SET d.description = $desc
            WITH d
            MATCH (l:Library {name: $lib_name})
            MERGE (d)-[:DEPRECATES]->(l)
            """, lib_name=dep.library_name, desc=dep.description)
            for service_name in dep.affected_services:
                tx.run("""
                MATCH (d:Deprecation {library_name: $lib_name})
                MATCH (s:Service {name: $service_name})
                MERGE (d)-[:AFFECTS]->(s)
                """, lib_name=dep.library_name, service_name=service_name)

if __name__ == "__main__":
    # 1. Extract data from document
    extracted_data = extract_graph_from_document(sample_document, KnowledgeGraph)
    
    # 2. Ingest into Neo4j
    ingestor = Neo4jIngestor(NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD)
    ingestor.ingest_data(extracted_data)
    ingestor.close()

After running this, our Neo4j database now contains a structured, queryable representation of our document, ready for advanced retrieval.

Section 2: Augmenting the Graph with Vector Embeddings

The graph structure gives us explicit relationships, but we still need to handle semantic meaning for unstructured text properties, like the description of a deprecation. This is where the hybrid approach shines. We can store vector embeddings directly on the nodes in our graph.

Neo4j supports vector indexing and similarity search, allowing us to perform a semantic search that is scoped by the graph structure. This is a powerful combination.

Generating and Storing Embeddings

We'll update our ingestion process to generate an embedding for the Deprecation node's description property and store it.

python

# vector_augmentation.py
# (Add this to the Neo4jIngestor class)
from openai import OpenAI

client = OpenAI()

def get_embedding(text: str, model="text-embedding-3-small") -> list[float]:
   text = text.replace("\n", " ")
   return client.embeddings.create(input=[text], model=model).data[0].embedding

class Neo4jIngestor:
    # ... (previous methods) ...

    @staticmethod
    def _create_graph_tx(tx, graph_data: KnowledgeGraph):
        # ... (previous ingestion logic for services, libraries) ...

        # Ingest deprecations with embeddings
        for dep in graph_data.deprecations:
            description_embedding = get_embedding(dep.description)
            tx.run("""
            MERGE (d:Deprecation {library_name: $lib_name})
            SET d.description = $desc,
                d.embedding = $embedding
            WITH d
            MATCH (l:Library {name: $lib_name})
            MERGE (d)-[:DEPRECATES]->(l)
            """, 
            lib_name=dep.library_name, 
            desc=dep.description, 
            embedding=description_embedding)
            # ... (linking affected services) ...

    def create_vector_index(self):
        with self.driver.session() as session:
            session.run("""
            CREATE VECTOR INDEX deprecation_embeddings IF NOT EXISTS
            FOR (d:Deprecation)
            ON (d.embedding)
            OPTIONS {indexConfig: {
                `vector.dimensions`: 1536, 
                `vector.similarity_function`: 'cosine'
            }}
            """)
            print("Vector index created successfully.")

# --- In main execution block ---
# ingestor.ingest_data(extracted_data)
# ingestor.create_vector_index() # Add this call
# ingestor.close()

We now have a graph where nodes are linked by explicit relationships, and certain nodes also contain rich semantic information in the form of vector embeddings. We've created the foundation for our advanced retrieval system.

Section 3: Advanced Retrieval: Fusing Graph Traversal and Vector Search

This is the core of the GraphRAG pattern. We will now design retrieval functions that leverage both the graph structure and the vector indexes to construct a highly relevant context for the LLM.

Strategy A: Vector-Initiated Graph Traversal

This strategy is ideal for queries that are broad or semantic in nature. The process is:

Embed the user's query.
Perform a vector similarity search on the graph to find the most relevant starting node(s).
From these starting nodes, execute a Cypher query to traverse the graph and collect the surrounding neighborhood of interconnected nodes and relationships.
Serialize this subgraph into a text format for the LLM context.

Let's implement this for a query like: "Tell me about the security issue that requires a migration by Q1 2024."

python

# retrieval_strategies.py
from neo4j import GraphDatabase
from typing import List, Dict, Any

# Assume get_embedding is available
from vector_augmentation import get_embedding

class GraphRetriever:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def vector_then_graph_retrieval(self, query: str) -> str:
        query_embedding = get_embedding(query)

        cypher_query = """
        CALL db.index.vector.queryNodes('deprecation_embeddings', 1, $embedding) YIELD node, score
        MATCH (node)-[r*1..2]-(related_node)
        RETURN node, r, related_node
        """

        with self.driver.session() as session:
            result = session.run(cypher_query, embedding=query_embedding)
            return self._format_graph_result(result)

    def _format_graph_result(self, result) -> str:
        # A simple but effective serialization format for the LLM
        context = """
        Here is the relevant information from the knowledge graph:
        """
        nodes = {}
        rels = []

        for record in result:
            start_node = record['node']
            relationships = record['r']
            end_node = record['related_node']

            nodes[start_node.id] = self._node_to_dict(start_node)
            nodes[end_node.id] = self._node_to_dict(end_node)

            for rel in relationships:
                rels.append(f"({rel.start_node.element_id})-[{type(rel).__name__}]->({rel.end_node.element_id})")

        context += "\n### Nodes\n"
        for node_id, node_props in nodes.items():
            context += f"- Node {node_id}: {node_props}\n"

        context += "\n### Relationships\n"
        for rel_str in set(rels):
            context += f"- {rel_str}\n"

        return context
    
    def _node_to_dict(self, node) -> Dict[str, Any]:
        props = dict(node.items())
        props['labels'] = list(node.labels)
        # Avoid sending large embeddings to the LLM context
        props.pop('embedding', None)
        return props

# Example Usage
if __name__ == "__main__":
    retriever = GraphRetriever(NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD)
    user_query = "Tell me about the security issue that requires a migration by Q1 2024."
    retrieved_context = retriever.vector_then_graph_retrieval(user_query)
    print(retrieved_context)
    retriever.close()

The retrieved_context will contain not just the deprecation notice but also the library it deprecates and the services it affects, all explicitly linked—a far richer context than a simple text chunk.

Strategy B: Entity-Driven Path Finding

This strategy is powerful for queries that mention specific entities. The process is:

Use an LLM or simpler NLP techniques (like NER) to extract key entities from the user's query.
Construct a Cypher query to find paths or relationships between these entities in the graph.
Serialize the resulting path(s) for the LLM context.

Let's implement this for our original query: "Which microservices, written in Go, will be affected by the upcoming deprecation of the v2 auth-lib?"

Here, the entities are microservices, Go, and auth-lib-v2.

python

# In GraphRetriever class

def entity_path_retrieval(self) -> str:
    # For this specific, hard-coded query. In a real system, you'd extract entities dynamically.
    cypher_query = """
    MATCH path = (s:Service {language: 'Go'})<-[:AFFECTS]-(d:Deprecation)-[:DEPRECATES]->(l:Library {name: 'auth-lib-v2'})
    RETURN path
    """

    with self.driver.session() as session:
        result = session.run(cypher_query)
        # We can reuse the formatter, but a path-specific one might be better
        return self._format_path_result(result)

def _format_path_result(self, result) -> str:
    context = """
    Found the following path(s) in the knowledge graph answering the query:
    """
    for record in result:
        path = record['path']
        context += "\n--- Path ---\n"
        nodes = [self._node_to_dict(n) for n in path.nodes]
        rels = [type(r).__name__ for r in path.relationships]
        
        path_str = f"({nodes[0]['name']})"
        for i, rel in enumerate(rels):
            path_str += f"-[:{rel}]->({nodes[i+1]['name']})"
        context += path_str + "\n"
        for node in nodes:
            context += f"  - Details for {node['name']}: {node}\n"
    return context

# Example Usage
if __name__ == "__main__":
    retriever = GraphRetriever(NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD)
    retrieved_context = retriever.entity_path_retrieval()
    print(retrieved_context)
    retriever.close()

This retrieval is surgical. It doesn't just find related information; it finds the precise chain of relationships that directly answers the user's question. The context provided to the LLM is not a guess—it's a proven path through the knowledge base.

Section 4: Production Patterns and Performance

Deploying a GraphRAG system requires attention to performance, scalability, and edge cases.

Context Serialization and Token Management

The context we pass to the LLM must be both informative and concise. The simple serialization format above is a good start, but consider these optimizations:

* Summarization Layer: For very dense graph neighborhoods, you could use another LLM call to summarize the properties of the retrieved nodes before passing them to the final answer-generation model.

Pruning: Limit the graph traversal depth ([r1..2]) based on a token budget. If the retrieved context is too large, prune less relevant or more distant nodes.

* Structured Formats: Experiment with passing context as JSON or even a simplified Cypher-like syntax that the LLM might better understand for reasoning about paths.

Cypher Query Optimization

Slow Cypher queries will bottleneck your retrieval. Always profile your queries in a production environment.

* Use PROFILE: Prefix your Cypher queries with PROFILE in the Neo4j Browser to see the query plan. Look for full scans (NodeByLabelScan) and high DB hits.

* Create Indexes: Beyond the vector index, create traditional indexes on frequently queried properties, like :Service(name) and :Library(name). Our ingestion script already included constraints, which automatically create backing indexes.

* Parameterize Queries: Always use parameters (e.g., $embedding, $name) as shown in the examples. This allows Neo4j to cache query plans, significantly speeding up repeated queries.

Edge Case: Ambiguity and Disconnected Subgraphs

* Query Ambiguity: What if a vector search returns multiple, equally relevant nodes? Your retrieval logic could either:

1. Traverse from all top-k nodes and merge the resulting contexts.

2. Present the options to the user for disambiguation (if in an interactive setting).

3. Add a summarization step to find the common theme among the starting nodes.

* No Path Found: An empty result from a Cypher query is not an error; it's data. It means the relationship the user is asking about does not exist in the knowledge base. Your system should be designed to report this accurately, e.g., "I found no evidence that Service A depends on Library B," rather than letting the LLM hallucinate a connection.

Section 5: Synthesizing the Final Answer

The final step is to combine the retrieved context with the user's query in a prompt and send it to the LLM for generation.

python

# full_rag_chain.py
from retrieval_strategies import GraphRetriever
from openai import OpenAI

client = OpenAI()

def run_full_graph_rag_chain(query: str) -> str:
    # In a real app, you'd have logic to choose the best retrieval strategy
    # based on the query structure. Here we'll default to vector-first.
    retriever = GraphRetriever(NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD)
    context = retriever.vector_then_graph_retrieval(query)
    retriever.close()

    # You can also use the entity-based retrieval for more specific queries.
    # context = retriever.entity_path_retrieval() # For the specific Go service query

    prompt = f"""
    You are a helpful AI assistant with access to a knowledge graph.
    Answer the user's question based *only* on the provided context.
    If the context does not contain the answer, state that you cannot answer.

    --- CONTEXT ---
    {context}
    --- END CONTEXT ---

    USER QUESTION: {query}
    
    ANSWER:
    """

    response = client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[
            {"role": "user", "content": prompt}
        ],
        temperature=0.0 # Be factual
    )

    return response.choices[0].message.content

# --- Example Comparisons ---

# Query 1: The specific, entity-driven query
query_1 = "Which microservices, written in Go, will be affected by the upcoming deprecation of the v2 auth-lib?"
# For this, we'd ideally use entity_path_retrieval to get the most precise context.
# The context would show the path: (Billing Service)-AFFECTED_BY->(Deprecation)-DEPRECATES->(auth-lib-v2)
# Expected Answer: "Based on the knowledge graph, the 'Billing Service', which is written in Go, will be affected by the deprecation of 'auth-lib-v2'."

# Query 2: The semantic query
query_2 = "Tell me about the security issue that requires a migration by Q1 2024."
# For this, we use vector_then_graph_retrieval.
# Expected Answer: "The security issue is related to the deprecation of 'auth-lib-v2'. The deprecation notice states it is due to security vulnerabilities and requires migration by Q1 2024. This deprecation affects the 'User Service' and 'Billing Service'."

if __name__ == "__main__":
    # Let's run the second query through the full chain
    final_answer = run_full_graph_rag_chain(query_2)
    print(f"\n--- Final Answer for Query 2 ---\n{final_answer}")

Conclusion

By representing knowledge as a graph, we elevate our RAG system from a simple semantic search tool to a genuine reasoning engine. The GraphRAG pattern, combining vector search for semantic entry points and graph traversal for contextual exploration, provides a robust solution for answering complex questions over interconnected data. While the implementation is more involved than a standard vector-only approach, the resulting accuracy and depth of understanding are essential for building next-generation, production-grade AI applications that can navigate the intricate relationships within any complex domain.

The Contextual Blind Spot of Vector-Based RAG

Section 1: Knowledge Graph Construction from Unstructured Data

Defining the Graph Schema

The LLM-Powered Extraction Pipeline

Ingesting into Neo4j

Section 2: Augmenting the Graph with Vector Embeddings

Generating and Storing Embeddings

Section 3: Advanced Retrieval: Fusing Graph Traversal and Vector Search

Strategy A: Vector-Initiated Graph Traversal

Strategy B: Entity-Driven Path Finding

Section 4: Production Patterns and Performance

Context Serialization and Token Management

Cypher Query Optimization

Edge Case: Ambiguity and Disconnected Subgraphs

Section 5: Synthesizing the Final Answer

Conclusion

Found this article helpful?