GraphRAG: Advanced Knowledge Retrieval with Neo4j and LLMs
The Relational Blind Spot of Vector-Based RAG
Retrieval-Augmented Generation (RAG) has become the de facto standard for grounding Large Language Models (LLMs) in factual, private data. The dominant pattern involves embedding document chunks into a vector space and performing semantic similarity searches to retrieve relevant context. While effective for unstructured text, this approach possesses a significant blind spot: it fundamentally misunderstands and flattens structured, relational data.
Consider a typical enterprise scenario: a complex ecosystem of microservices, libraries, teams, and deployments. A senior engineer might ask, "Which services, maintained by the platform-alpha team, are indirectly exposed to the recent Log4Shell vulnerability through a transitive dependency?"
A standard vector RAG system would likely fail here. It might retrieve documents mentioning "Log4Shell," "platform-alpha team," and various services, but it cannot perform the multi-hop traversal required to connect these entities: Team -> MAINTAINS -> Service -> USES -> Library -> HAS_DEPENDENCY -> vulnerable_library -> HAS_VULNERABILITY -> Log4Shell.
Stuffing more context into the LLM is not a scalable solution; it's brute force. The real solution is to retrieve context that mirrors the relational structure of the domain. This is where GraphRAG—the fusion of knowledge graphs and LLMs—moves from a theoretical concept to a production necessity. This article details the implementation of a robust GraphRAG system using Neo4j for the graph database, an LLM as a Cypher query generator, and Python for the orchestration logic.
We will not cover the basics of RAG or graph databases. We assume you understand why you'd choose a graph and are here to solve the hard problem: reliably translating human questions into precise graph traversals.
1. Modeling the Domain: The Knowledge Graph Schema
Before we can query, we must model. A well-defined graph schema is the foundation of an effective GraphRAG system. It provides the structural constraints the LLM needs to generate valid and efficient queries. For our microservice dependency example, our model will consist of the following entities and relationships:
* Nodes:
* (:Service {name: string, language: string})
* (:Library {name: string, version: string})
* (:Vulnerability {id: string, severity: string, summary: string})
* (:Team {name: string})
* Relationships:
* (:Service)-[:USES_LIBRARY]->(:Library)
* (:Team)-[:MAINTAINS]->(:Service)
* (:Library)-[:HAS_VULNERABILITY]->(:Vulnerability)
Let's formalize this in Neo4j by setting up constraints. These are critical for data integrity and query performance.
// Enforce uniqueness for our primary entities
CREATE CONSTRAINT service_name_unique IF NOT EXISTS FOR (s:Service) REQUIRE s.name IS UNIQUE;
CREATE CONSTRAINT library_name_version_unique IF NOT EXISTS FOR (l:Library) REQUIRE (l.name, l.version) IS UNIQUE;
CREATE CONSTRAINT vulnerability_id_unique IF NOT EXISTS FOR (v:Vulnerability) REQUIRE v.id IS UNIQUE;
CREATE CONSTRAINT team_name_unique IF NOT EXISTS FOR (t:Team) REQUIRE t.name IS UNIQUE;
Now, let's ingest some sample data to create a multi-hop scenario. Notice the transitive dependency: auth-service uses framework-core, which in turn uses the vulnerable log4j library.
// Create nodes
CREATE (t_alpha:Team {name: 'platform-alpha'});
CREATE (t_beta:Team {name: 'data-services'});
CREATE (s_auth:Service {name: 'auth-service', language: 'Java'});
CREATE (s_billing:Service {name: 'billing-service', language: 'Python'});
CREATE (s_reporting:Service {name: 'reporting-service', language: 'Java'});
CREATE (l_framework:Library {name: 'framework-core', version: '1.2.0'});
CREATE (l_log4j:Library {name: 'log4j', version: '2.14.1'});
CREATE (l_requests:Library {name: 'requests', version: '2.25.1'});
CREATE (v_log4shell:Vulnerability {id: 'CVE-2021-44228', severity: 'Critical', summary: 'Remote code execution in Log4j 2'});
// Create relationships
MERGE (t_alpha)-[:MAINTAINS]->(s_auth);
MERGE (t_alpha)-[:MAINTAINS]->(s_billing);
MERGE (t_beta)-[:MAINTAINS]->(s_reporting);
MERGE (s_auth)-[:USES_LIBRARY]->(l_framework);
MERGE (s_reporting)-[:USES_LIBRARY]->(l_log4j);
MERGE (s_billing)-[:USES_LIBRARY]->(l_requests);
// The critical transitive dependency
MERGE (l_framework)-[:USES_LIBRARY]->(l_log4j);
MERGE (l_log4j)-[:HAS_VULNERABILITY]->(v_log4shell);
This graph now contains the precise relationships a vector database would miss. Our task is to empower an LLM to navigate it.
2. The Core Logic: LLM-Powered Cypher Generation
The central challenge of GraphRAG is converting a natural language question into a syntactically correct and semantically appropriate Cypher query. This is a translation task, perfectly suited for a powerful instruction-following LLM like GPT-4 or Claude 3 Opus. The quality of this translation hinges almost entirely on the quality of the system prompt.
A naive prompt like "You are a Cypher expert, answer the user's question" will fail spectacularly. A production-grade prompt must be engineered with three key components:
Here is a production-ready system prompt for our use case:
# System Prompt for Cypher Generation
You are an expert Neo4j developer and your goal is to write Cypher queries to answer user questions against a knowledge graph.
## 1. Schema
The graph schema is as follows:
- **Nodes**:
- `Service` with properties: `name` (string), `language` (string)
- `Library` with properties: `name` (string), `version` (string)
- `Vulnerability` with properties: `id` (string), `severity` (string), `summary` (string)
- `Team` with properties: `name` (string)
- **Relationships**:
- `(:Team)-[:MAINTAINS]->(:Service)`
- `(:Service)-[:USES_LIBRARY]->(:Library)`
- `(:Library)-[:HAS_VULNERABILITY]->(:Vulnerability)`
Note that the `USES_LIBRARY` relationship can be chained for transitive dependencies (e.g., `(:Service)-[:USES_LIBRARY]->(:Library)-[:USES_LIBRARY]->(:Library)`).
## 2. Few-Shot Examples
**Question**: "Which services use the log4j library directly?"
**Query**:
MATCH (s:Service)-[:USES_LIBRARY]->(l:Library {name: 'log4j'})
RETURN s.name AS service_name;
**Question**: "What is the summary for CVE-2021-44228?"
**Query**:
MATCH (v:Vulnerability {id: 'CVE-2021-44228'})
RETURN v.summary AS summary;
**Question**: "Find all Java services maintained by the platform-alpha team."
**Query**:
MATCH (t:Team {name: 'platform-alpha'})-[:MAINTAINS]->(s:Service {language: 'Java'})
RETURN s.name AS service_name;
## 3. Instructions
- Only respond with the Cypher query. Do not include any explanations, introductory text, or markdown formatting like ````cypher`.
- If the user's question is ambiguous or cannot be answered by the provided schema, respond with only the text: "QUERY_GENERATION_FAILED: Ambiguous question."
- Do not generate queries that modify the graph (e.g., CREATE, MERGE, SET, DELETE, REMOVE). If a modification is requested, respond with "QUERY_GENERATION_FAILED: Write operations are not allowed."
- Pay close attention to property names and relationship directions.
Now let's implement the Python code to use this prompt.
import os
import openai
from neo4j import GraphDatabase
# It's best practice to use environment variables for credentials
NEO4J_URI = os.getenv("NEO4J_URI", "bolt://localhost:7687")
NEO4J_USER = os.getenv("NEO4J_USER", "neo4j")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD", "password")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
openai.api_key = OPENAI_API_KEY
CYPHER_GENERATION_PROMPT = """... # Paste the full system prompt from above here ..."""
class GraphRAGPipeline:
def __init__(self, neo4j_driver, llm_client):
self.driver = neo4j_driver
self.llm_client = llm_client
self.cypher_prompt = CYPHER_GENERATION_PROMPT
def generate_cypher(self, question: str) -> str | None:
"""Step 1: Use an LLM to generate a Cypher query."""
try:
response = self.llm_client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{"role": "system", "content": self.cypher_prompt},
{"role": "user", "content": question},
],
temperature=0.0 # We want deterministic, precise queries
)
generated_query = response.choices[0].message.content.strip()
if generated_query.startswith("QUERY_GENERATION_FAILED"):
print(f"Cypher generation failed: {generated_query}")
return None
return generated_query
except Exception as e:
print(f"An error occurred during Cypher generation: {e}")
return None
# Example Usage
if __name__ == '__main__':
neo4j_driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
pipeline = GraphRAGPipeline(neo4j_driver, openai)
question = "Which services, maintained by the platform-alpha team, are exposed to the Log4Shell vulnerability (CVE-2021-44228)?"
generated_query = pipeline.generate_cypher(question)
if generated_query:
print("--- Generated Cypher Query ---")
print(generated_query)
neo4j_driver.close()
When we run this with our complex question, the LLM, guided by the detailed prompt, should generate a query that performs the required multi-hop traversal:
MATCH (t:Team {name: 'platform-alpha'})-[:MAINTAINS]->(s:Service)-[:USES_LIBRARY*1..5]->(l:Library)-[:HAS_VULNERABILITY]->(v:Vulnerability {id: 'CVE-2021-44228'})
RETURN DISTINCT s.name AS service_name
This query is exactly what's needed. It finds services maintained by the target team, follows a variable-length path (*1..5) of USES_LIBRARY relationships to account for transitive dependencies, and connects to the specific vulnerability.
3. Execution, Augmentation, and Synthesis
Generating the query is only the first step. We now need to execute it, format the results into a useful context, and feed that context to another LLM call to generate the final, human-readable answer.
First, let's add the execution logic to our pipeline.
# Add this method to the GraphRAGPipeline class
def execute_cypher(self, query: str) -> list[dict]:
"""Step 2: Execute the Cypher query against the Neo4j database."""
try:
with self.driver.session() as session:
result = session.run(query)
# Convert Neo4j records to a list of dictionaries
return [record.data() for record in result]
except Exception as e:
# This will catch CypherSyntaxError, etc.
print(f"Failed to execute Cypher query: {e}")
return []
Next, we need to synthesize the final answer. The raw JSON output from the database ([{'service_name': 'auth-service'}, {'service_name': 'reporting-service'}]) is not ideal context for an LLM. It's better to format it into a clear, textual statement.
# Add this method to the GraphRAGPipeline class
def synthesize_answer(self, question: str, context: list[dict]) -> str:
"""Step 3: Use an LLM to synthesize a natural language answer from the context."""
if not context:
return "I couldn't find any information in the knowledge graph to answer your question."
# Simple context formatting for this example
context_str = "\n".join([str(item) for item in context])
synthesis_prompt = f"""
You are a helpful assistant.
Based on the following context retrieved from a knowledge graph, please provide a concise and direct answer to the user's question.
**Question**:
{question}
**Context from Knowledge Graph**:
{context_str}
**Answer**:
"""
try:
response = self.llm_client.chat.completions.create(
model="gpt-3.5-turbo", # A cheaper model is fine for synthesis
messages=[
{"role": "system", "content": "You are a helpful assistant providing answers based on retrieved data."},
{"role": "user", "content": synthesis_prompt},
],
temperature=0.7
)
return response.choices[0].message.content
except Exception as e:
print(f"An error occurred during answer synthesis: {e}")
return "Sorry, I encountered an error while formulating the final answer."
Finally, let's tie it all together in a single run method.
# Add this method to the GraphRAGPipeline class
def run(self, question: str) -> str:
"""Run the full GraphRAG pipeline."""
print(f"Processing question: {question}")
# Step 1: Generate Cypher
generated_query = self.generate_cypher(question)
if not generated_query:
return "I was unable to generate a valid query for your question."
print(f"\n--- Generated Cypher Query ---\n{generated_query}")
# Step 2: Execute Cypher
retrieved_context = self.execute_cypher(generated_query)
if not retrieved_context:
return "I found no results in the knowledge graph for your question."
print(f"\n--- Retrieved Context ---\n{retrieved_context}")
# Step 3: Synthesize Answer
final_answer = self.synthesize_answer(question, retrieved_context)
print(f"\n--- Final Answer ---\n{final_answer}")
return final_answer
# Updated main execution block
if __name__ == '__main__':
neo4j_driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
pipeline = GraphRAGPipeline(neo4j_driver, openai)
question = "Which services, maintained by the platform-alpha team, are exposed to the Log4Shell vulnerability (CVE-2021-44228)?"
pipeline.run(question)
neo4j_driver.close()
This complete pipeline now correctly identifies that both auth-service (transitively) and reporting-service (directly, if maintained by alpha team - our data has beta) are affected, a result unattainable with simple vector search.
4. Advanced Patterns and Production Hardening
A prototype is one thing; a production system is another. Here are critical considerations and patterns for making this system robust.
A. Query Repair Loop for Error Handling
The LLM will not always generate perfect Cypher. It might hallucinate a property name or get a relationship direction wrong. A production system cannot simply fail. We must implement a retry mechanism that feeds the database error back to the LLM for self-correction.
# In GraphRAGPipeline class, modify the run method
def run_with_repair(self, question: str, max_retries: int = 2) -> str:
"""Run the pipeline with a query repair loop."""
print(f"Processing question: {question}")
generated_query = self.generate_cypher(question)
if not generated_query:
return "I was unable to generate a valid query for your question."
retrieved_context = None
for i in range(max_retries + 1):
print(f"\n--- Attempt {i+1} ---")
print(f"Generated Cypher Query:\n{generated_query}")
try:
with self.driver.session() as session:
result = session.run(generated_query)
retrieved_context = [record.data() for record in result]
print(f"\n--- Query Successful ---\nRetrieved Context:\n{retrieved_context}")
break # Exit loop on success
except Exception as e:
error_message = str(e)
print(f"\n--- Query Failed ---\nError: {error_message}")
if i >= max_retries:
return "The generated query failed multiple times. Please rephrase your question."
# Self-correction step
print("\n--- Attempting Query Repair ---")
repair_prompt = f"The previous Cypher query failed with the following error:\n{error_message}\n\nPlease correct the query. The original question was: {question}\n\nCorrected Cypher Query:"
# We reuse the generation logic, but with more context
response = self.llm_client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{"role": "system", "content": self.cypher_prompt},
{"role": "user", "content": f"The user's question is: {question}"},
{"role": "assistant", "content": generated_query},
{"role": "user", "content": repair_prompt}
],
temperature=0.0
)
generated_query = response.choices[0].message.content.strip()
if retrieved_context is None:
return "I was unable to execute a query successfully after several attempts."
final_answer = self.synthesize_answer(question, retrieved_context)
print(f"\n--- Final Answer ---\n{final_answer}")
return final_answer
This loop significantly improves the reliability of the system by allowing the LLM to learn from its mistakes in real-time.
B. Performance Optimization with Indexes and Profiling
As the graph grows, query performance becomes paramount. The LLM has no inherent knowledge of database optimization.
MATCH clauses are indexed. In our schema, this would be Service(name), Library(name), Vulnerability(id), and Team(name). CREATE INDEX service_name_index IF NOT EXISTS FOR (s:Service) ON (s.name);
CREATE INDEX vulnerability_id_index IF NOT EXISTS FOR (v:Vulnerability) ON (v.id);
// etc.
PROFILE keyword. Prepending PROFILE to a Cypher query in the Neo4j Browser or Bloom will show the query plan and highlight bottlenecks, such as a full node scan instead of an index lookup. You can then use this information to either add a missing index or, in extreme cases, add hints to the Cypher generation prompt to guide the LLM toward more performant query patterns.C. Hybrid Search: Combining Graph and Vector
Sometimes, a query has both a semantic and a structural component, for example, "Find services related to 'payment processing' that are affected by critical vulnerabilities." 'Payment processing' is a fuzzy, semantic concept, while 'affected by critical vulnerabilities' is a structural graph traversal.
The optimal solution is a hybrid approach:
This pattern leverages the best of both worlds, using semantic search for discovery and graph search for precise, relational exploration.
D. Security: The Risk of Executing Generated Code
Executing LLM-generated code against a database is inherently risky. A malicious or poorly phrased user prompt could trick the LLM into generating a destructive query (DELETE, REMOVE). Our prompt engineering already includes a guardrail against this, but it's not foolproof.
Production-grade security requires multiple layers:
cypher-parser and check the abstract syntax tree for forbidden command types. This is more robust than simple string matching.Conclusion
GraphRAG represents a significant evolution beyond standard vector search-based RAG systems. By leveraging the explicit, structured relationships within a knowledge graph, we can answer complex, multi-hop questions that are intractable for systems that treat data as a flat collection of text chunks.
The core implementation challenge lies in the reliable translation of natural language to a graph query language like Cypher. As we've demonstrated, this is achievable through meticulous prompt engineering, schema definition, few-shot examples, and robust error-handling patterns like the query repair loop.
While not a replacement for vector RAG, GraphRAG is a powerful, complementary tool. For domains defined by their relationships—be it software dependencies, financial networks, biomedical research, or supply chains—it provides the necessary retrieval mechanism to unlock a deeper, more contextual understanding for Large Language Models.