GraphRAG: Advanced Knowledge Retrieval with Neo4j and LLMs

20 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Relational Blind Spot of Vector-Based RAG

Retrieval-Augmented Generation (RAG) has become the de facto standard for grounding Large Language Models (LLMs) in factual, private data. The dominant pattern involves embedding document chunks into a vector space and performing semantic similarity searches to retrieve relevant context. While effective for unstructured text, this approach possesses a significant blind spot: it fundamentally misunderstands and flattens structured, relational data.

Consider a typical enterprise scenario: a complex ecosystem of microservices, libraries, teams, and deployments. A senior engineer might ask, "Which services, maintained by the platform-alpha team, are indirectly exposed to the recent Log4Shell vulnerability through a transitive dependency?"

A standard vector RAG system would likely fail here. It might retrieve documents mentioning "Log4Shell," "platform-alpha team," and various services, but it cannot perform the multi-hop traversal required to connect these entities: Team -> MAINTAINS -> Service -> USES -> Library -> HAS_DEPENDENCY -> vulnerable_library -> HAS_VULNERABILITY -> Log4Shell.

Stuffing more context into the LLM is not a scalable solution; it's brute force. The real solution is to retrieve context that mirrors the relational structure of the domain. This is where GraphRAG—the fusion of knowledge graphs and LLMs—moves from a theoretical concept to a production necessity. This article details the implementation of a robust GraphRAG system using Neo4j for the graph database, an LLM as a Cypher query generator, and Python for the orchestration logic.

We will not cover the basics of RAG or graph databases. We assume you understand why you'd choose a graph and are here to solve the hard problem: reliably translating human questions into precise graph traversals.


1. Modeling the Domain: The Knowledge Graph Schema

Before we can query, we must model. A well-defined graph schema is the foundation of an effective GraphRAG system. It provides the structural constraints the LLM needs to generate valid and efficient queries. For our microservice dependency example, our model will consist of the following entities and relationships:

* Nodes:

* (:Service {name: string, language: string})

* (:Library {name: string, version: string})

* (:Vulnerability {id: string, severity: string, summary: string})

* (:Team {name: string})

* Relationships:

* (:Service)-[:USES_LIBRARY]->(:Library)

* (:Team)-[:MAINTAINS]->(:Service)

* (:Library)-[:HAS_VULNERABILITY]->(:Vulnerability)

Let's formalize this in Neo4j by setting up constraints. These are critical for data integrity and query performance.

cypher
// Enforce uniqueness for our primary entities
CREATE CONSTRAINT service_name_unique IF NOT EXISTS FOR (s:Service) REQUIRE s.name IS UNIQUE;
CREATE CONSTRAINT library_name_version_unique IF NOT EXISTS FOR (l:Library) REQUIRE (l.name, l.version) IS UNIQUE;
CREATE CONSTRAINT vulnerability_id_unique IF NOT EXISTS FOR (v:Vulnerability) REQUIRE v.id IS UNIQUE;
CREATE CONSTRAINT team_name_unique IF NOT EXISTS FOR (t:Team) REQUIRE t.name IS UNIQUE;

Now, let's ingest some sample data to create a multi-hop scenario. Notice the transitive dependency: auth-service uses framework-core, which in turn uses the vulnerable log4j library.

cypher
// Create nodes
CREATE (t_alpha:Team {name: 'platform-alpha'});
CREATE (t_beta:Team {name: 'data-services'});

CREATE (s_auth:Service {name: 'auth-service', language: 'Java'});
CREATE (s_billing:Service {name: 'billing-service', language: 'Python'});
CREATE (s_reporting:Service {name: 'reporting-service', language: 'Java'});

CREATE (l_framework:Library {name: 'framework-core', version: '1.2.0'});
CREATE (l_log4j:Library {name: 'log4j', version: '2.14.1'});
CREATE (l_requests:Library {name: 'requests', version: '2.25.1'});

CREATE (v_log4shell:Vulnerability {id: 'CVE-2021-44228', severity: 'Critical', summary: 'Remote code execution in Log4j 2'});

// Create relationships
MERGE (t_alpha)-[:MAINTAINS]->(s_auth);
MERGE (t_alpha)-[:MAINTAINS]->(s_billing);
MERGE (t_beta)-[:MAINTAINS]->(s_reporting);

MERGE (s_auth)-[:USES_LIBRARY]->(l_framework);
MERGE (s_reporting)-[:USES_LIBRARY]->(l_log4j);
MERGE (s_billing)-[:USES_LIBRARY]->(l_requests);

// The critical transitive dependency
MERGE (l_framework)-[:USES_LIBRARY]->(l_log4j);

MERGE (l_log4j)-[:HAS_VULNERABILITY]->(v_log4shell);

This graph now contains the precise relationships a vector database would miss. Our task is to empower an LLM to navigate it.


2. The Core Logic: LLM-Powered Cypher Generation

The central challenge of GraphRAG is converting a natural language question into a syntactically correct and semantically appropriate Cypher query. This is a translation task, perfectly suited for a powerful instruction-following LLM like GPT-4 or Claude 3 Opus. The quality of this translation hinges almost entirely on the quality of the system prompt.

A naive prompt like "You are a Cypher expert, answer the user's question" will fail spectacularly. A production-grade prompt must be engineered with three key components:

  • Schema Definition: Explicitly provide the LLM with the node labels, their properties, and the relationship types with their directions. This is the grammar of our graph.
  • Few-Shot Examples: Provide 2-3 examples of complex questions and the ideal Cypher queries to answer them. This primes the model on the expected query structure and complexity.
  • Strict Instructions: Define the output format (e.g., only the Cypher query, no explanations), what to do in case of ambiguity (e.g., state what is unclear), and how to handle certain keywords.
  • Here is a production-ready system prompt for our use case:

    text
    # System Prompt for Cypher Generation
    
    You are an expert Neo4j developer and your goal is to write Cypher queries to answer user questions against a knowledge graph.
    
    ## 1. Schema
    The graph schema is as follows:
    - **Nodes**:
      - `Service` with properties: `name` (string), `language` (string)
      - `Library` with properties: `name` (string), `version` (string)
      - `Vulnerability` with properties: `id` (string), `severity` (string), `summary` (string)
      - `Team` with properties: `name` (string)
    - **Relationships**:
      - `(:Team)-[:MAINTAINS]->(:Service)`
      - `(:Service)-[:USES_LIBRARY]->(:Library)`
      - `(:Library)-[:HAS_VULNERABILITY]->(:Vulnerability)`
    
    Note that the `USES_LIBRARY` relationship can be chained for transitive dependencies (e.g., `(:Service)-[:USES_LIBRARY]->(:Library)-[:USES_LIBRARY]->(:Library)`).
    
    ## 2. Few-Shot Examples
    
    **Question**: "Which services use the log4j library directly?"
    **Query**:

    MATCH (s:Service)-[:USES_LIBRARY]->(l:Library {name: 'log4j'})

    RETURN s.name AS service_name;

    text
    **Question**: "What is the summary for CVE-2021-44228?"
    **Query**:

    MATCH (v:Vulnerability {id: 'CVE-2021-44228'})

    RETURN v.summary AS summary;

    text
    **Question**: "Find all Java services maintained by the platform-alpha team."
    **Query**:

    MATCH (t:Team {name: 'platform-alpha'})-[:MAINTAINS]->(s:Service {language: 'Java'})

    RETURN s.name AS service_name;

    text
    ## 3. Instructions
    - Only respond with the Cypher query. Do not include any explanations, introductory text, or markdown formatting like ````cypher`.
    - If the user's question is ambiguous or cannot be answered by the provided schema, respond with only the text: "QUERY_GENERATION_FAILED: Ambiguous question."
    - Do not generate queries that modify the graph (e.g., CREATE, MERGE, SET, DELETE, REMOVE). If a modification is requested, respond with "QUERY_GENERATION_FAILED: Write operations are not allowed."
    - Pay close attention to property names and relationship directions.

    Now let's implement the Python code to use this prompt.

    python
    import os
    import openai
    from neo4j import GraphDatabase
    
    # It's best practice to use environment variables for credentials
    NEO4J_URI = os.getenv("NEO4J_URI", "bolt://localhost:7687")
    NEO4J_USER = os.getenv("NEO4J_USER", "neo4j")
    NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD", "password")
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    
    openai.api_key = OPENAI_API_KEY
    
    CYPHER_GENERATION_PROMPT = """... # Paste the full system prompt from above here ..."""
    
    class GraphRAGPipeline:
        def __init__(self, neo4j_driver, llm_client):
            self.driver = neo4j_driver
            self.llm_client = llm_client
            self.cypher_prompt = CYPHER_GENERATION_PROMPT
    
        def generate_cypher(self, question: str) -> str | None:
            """Step 1: Use an LLM to generate a Cypher query."""
            try:
                response = self.llm_client.chat.completions.create(
                    model="gpt-4-turbo-preview",
                    messages=[
                        {"role": "system", "content": self.cypher_prompt},
                        {"role": "user", "content": question},
                    ],
                    temperature=0.0 # We want deterministic, precise queries
                )
                generated_query = response.choices[0].message.content.strip()
    
                if generated_query.startswith("QUERY_GENERATION_FAILED"):
                    print(f"Cypher generation failed: {generated_query}")
                    return None
                
                return generated_query
            except Exception as e:
                print(f"An error occurred during Cypher generation: {e}")
                return None
    
    # Example Usage
    if __name__ == '__main__':
        neo4j_driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
        pipeline = GraphRAGPipeline(neo4j_driver, openai)
        
        question = "Which services, maintained by the platform-alpha team, are exposed to the Log4Shell vulnerability (CVE-2021-44228)?"
        
        generated_query = pipeline.generate_cypher(question)
        
        if generated_query:
            print("--- Generated Cypher Query ---")
            print(generated_query)
        
        neo4j_driver.close()

    When we run this with our complex question, the LLM, guided by the detailed prompt, should generate a query that performs the required multi-hop traversal:

    cypher
    MATCH (t:Team {name: 'platform-alpha'})-[:MAINTAINS]->(s:Service)-[:USES_LIBRARY*1..5]->(l:Library)-[:HAS_VULNERABILITY]->(v:Vulnerability {id: 'CVE-2021-44228'})
    RETURN DISTINCT s.name AS service_name

    This query is exactly what's needed. It finds services maintained by the target team, follows a variable-length path (*1..5) of USES_LIBRARY relationships to account for transitive dependencies, and connects to the specific vulnerability.


    3. Execution, Augmentation, and Synthesis

    Generating the query is only the first step. We now need to execute it, format the results into a useful context, and feed that context to another LLM call to generate the final, human-readable answer.

    First, let's add the execution logic to our pipeline.

    python
    # Add this method to the GraphRAGPipeline class
    
        def execute_cypher(self, query: str) -> list[dict]:
            """Step 2: Execute the Cypher query against the Neo4j database."""
            try:
                with self.driver.session() as session:
                    result = session.run(query)
                    # Convert Neo4j records to a list of dictionaries
                    return [record.data() for record in result]
            except Exception as e:
                # This will catch CypherSyntaxError, etc.
                print(f"Failed to execute Cypher query: {e}")
                return []

    Next, we need to synthesize the final answer. The raw JSON output from the database ([{'service_name': 'auth-service'}, {'service_name': 'reporting-service'}]) is not ideal context for an LLM. It's better to format it into a clear, textual statement.

    python
    # Add this method to the GraphRAGPipeline class
    
        def synthesize_answer(self, question: str, context: list[dict]) -> str:
            """Step 3: Use an LLM to synthesize a natural language answer from the context."""
            if not context:
                return "I couldn't find any information in the knowledge graph to answer your question."
    
            # Simple context formatting for this example
            context_str = "\n".join([str(item) for item in context])
    
            synthesis_prompt = f"""
    You are a helpful assistant.
    Based on the following context retrieved from a knowledge graph, please provide a concise and direct answer to the user's question.
    
    **Question**:
    {question}
    
    **Context from Knowledge Graph**:
    {context_str}
    
    **Answer**:
    """
    
            try:
                response = self.llm_client.chat.completions.create(
                    model="gpt-3.5-turbo", # A cheaper model is fine for synthesis
                    messages=[
                        {"role": "system", "content": "You are a helpful assistant providing answers based on retrieved data."},
                        {"role": "user", "content": synthesis_prompt},
                    ],
                    temperature=0.7
                )
                return response.choices[0].message.content
            except Exception as e:
                print(f"An error occurred during answer synthesis: {e}")
                return "Sorry, I encountered an error while formulating the final answer."

    Finally, let's tie it all together in a single run method.

    python
    # Add this method to the GraphRAGPipeline class
    
        def run(self, question: str) -> str:
            """Run the full GraphRAG pipeline."""
            print(f"Processing question: {question}")
            
            # Step 1: Generate Cypher
            generated_query = self.generate_cypher(question)
            if not generated_query:
                return "I was unable to generate a valid query for your question."
            
            print(f"\n--- Generated Cypher Query ---\n{generated_query}")
    
            # Step 2: Execute Cypher
            retrieved_context = self.execute_cypher(generated_query)
            if not retrieved_context:
                return "I found no results in the knowledge graph for your question."
            
            print(f"\n--- Retrieved Context ---\n{retrieved_context}")
    
            # Step 3: Synthesize Answer
            final_answer = self.synthesize_answer(question, retrieved_context)
            print(f"\n--- Final Answer ---\n{final_answer}")
            
            return final_answer
    
    # Updated main execution block
    if __name__ == '__main__':
        neo4j_driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
        pipeline = GraphRAGPipeline(neo4j_driver, openai)
        
        question = "Which services, maintained by the platform-alpha team, are exposed to the Log4Shell vulnerability (CVE-2021-44228)?"
        
        pipeline.run(question)
        
        neo4j_driver.close()

    This complete pipeline now correctly identifies that both auth-service (transitively) and reporting-service (directly, if maintained by alpha team - our data has beta) are affected, a result unattainable with simple vector search.


    4. Advanced Patterns and Production Hardening

    A prototype is one thing; a production system is another. Here are critical considerations and patterns for making this system robust.

    A. Query Repair Loop for Error Handling

    The LLM will not always generate perfect Cypher. It might hallucinate a property name or get a relationship direction wrong. A production system cannot simply fail. We must implement a retry mechanism that feeds the database error back to the LLM for self-correction.

    python
    # In GraphRAGPipeline class, modify the run method
    
        def run_with_repair(self, question: str, max_retries: int = 2) -> str:
            """Run the pipeline with a query repair loop."""
            print(f"Processing question: {question}")
            
            generated_query = self.generate_cypher(question)
            if not generated_query:
                return "I was unable to generate a valid query for your question."
    
            retrieved_context = None
            for i in range(max_retries + 1):
                print(f"\n--- Attempt {i+1} ---")
                print(f"Generated Cypher Query:\n{generated_query}")
    
                try:
                    with self.driver.session() as session:
                        result = session.run(generated_query)
                        retrieved_context = [record.data() for record in result]
                    
                    print(f"\n--- Query Successful ---\nRetrieved Context:\n{retrieved_context}")
                    break # Exit loop on success
    
                except Exception as e:
                    error_message = str(e)
                    print(f"\n--- Query Failed ---\nError: {error_message}")
                    
                    if i >= max_retries:
                        return "The generated query failed multiple times. Please rephrase your question."
                    
                    # Self-correction step
                    print("\n--- Attempting Query Repair ---")
                    repair_prompt = f"The previous Cypher query failed with the following error:\n{error_message}\n\nPlease correct the query. The original question was: {question}\n\nCorrected Cypher Query:"
                    
                    # We reuse the generation logic, but with more context
                    response = self.llm_client.chat.completions.create(
                        model="gpt-4-turbo-preview",
                        messages=[
                            {"role": "system", "content": self.cypher_prompt},
                            {"role": "user", "content": f"The user's question is: {question}"},
                            {"role": "assistant", "content": generated_query},
                            {"role": "user", "content": repair_prompt}
                        ],
                        temperature=0.0
                    )
                    generated_query = response.choices[0].message.content.strip()
    
            if retrieved_context is None:
                return "I was unable to execute a query successfully after several attempts."
    
            final_answer = self.synthesize_answer(question, retrieved_context)
            print(f"\n--- Final Answer ---\n{final_answer}")
            return final_answer

    This loop significantly improves the reliability of the system by allowing the LLM to learn from its mistakes in real-time.

    B. Performance Optimization with Indexes and Profiling

    As the graph grows, query performance becomes paramount. The LLM has no inherent knowledge of database optimization.

  • Indexing: Ensure all properties used in MATCH clauses are indexed. In our schema, this would be Service(name), Library(name), Vulnerability(id), and Team(name).
  • cypher
        CREATE INDEX service_name_index IF NOT EXISTS FOR (s:Service) ON (s.name);
        CREATE INDEX vulnerability_id_index IF NOT EXISTS FOR (v:Vulnerability) ON (v.id);
        // etc.
  • Query Profiling: When you observe a slow query, use Neo4j's PROFILE keyword. Prepending PROFILE to a Cypher query in the Neo4j Browser or Bloom will show the query plan and highlight bottlenecks, such as a full node scan instead of an index lookup. You can then use this information to either add a missing index or, in extreme cases, add hints to the Cypher generation prompt to guide the LLM toward more performant query patterns.
  • C. Hybrid Search: Combining Graph and Vector

    Sometimes, a query has both a semantic and a structural component, for example, "Find services related to 'payment processing' that are affected by critical vulnerabilities." 'Payment processing' is a fuzzy, semantic concept, while 'affected by critical vulnerabilities' is a structural graph traversal.

    The optimal solution is a hybrid approach:

  • Vector Search: Use a vector index (Neo4j has built-in vector indexing capabilities) on node properties like service descriptions to find initial candidate nodes (e.g., services with descriptions semantically similar to 'payment processing').
  • Graph Traversal: Use the results of the vector search as the starting point for a precise Cypher traversal to satisfy the structural part of the query.
  • This pattern leverages the best of both worlds, using semantic search for discovery and graph search for precise, relational exploration.

    D. Security: The Risk of Executing Generated Code

    Executing LLM-generated code against a database is inherently risky. A malicious or poorly phrased user prompt could trick the LLM into generating a destructive query (DELETE, REMOVE). Our prompt engineering already includes a guardrail against this, but it's not foolproof.

    Production-grade security requires multiple layers:

  • Read-Only User: The application should connect to Neo4j using a user with read-only permissions. This is the single most effective mitigation.
  • Query Inspection: Before execution, parse the generated Cypher query with a library like cypher-parser and check the abstract syntax tree for forbidden command types. This is more robust than simple string matching.
  • Schema Scoping: The schema provided in the prompt acts as a security boundary. Do not include sensitive nodes or properties in the prompt's schema definition if you don't want the LLM to access them.

  • Conclusion

    GraphRAG represents a significant evolution beyond standard vector search-based RAG systems. By leveraging the explicit, structured relationships within a knowledge graph, we can answer complex, multi-hop questions that are intractable for systems that treat data as a flat collection of text chunks.

    The core implementation challenge lies in the reliable translation of natural language to a graph query language like Cypher. As we've demonstrated, this is achievable through meticulous prompt engineering, schema definition, few-shot examples, and robust error-handling patterns like the query repair loop.

    While not a replacement for vector RAG, GraphRAG is a powerful, complementary tool. For domains defined by their relationships—be it software dependencies, financial networks, biomedical research, or supply chains—it provides the necessary retrieval mechanism to unlock a deeper, more contextual understanding for Large Language Models.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles