- Published on
Knowledge Graph Generator: Extract Entities and Visualize Relationships with AI
- Authors

- Name
- Pranav Reddy
- @saipranav14
Knowledge Graph Generator: AI-Powered Entity Extraction
Transform unstructured text into structured knowledge! In this post, I'll show you how to build a knowledge graph generator that extracts entities, detects relationships, and creates interactive visualizations.
π― What It Does
Input text:
"Apple Inc. was founded by Steve Jobs in Cupertino, California.
The company revolutionized personal computing with the Macintosh."
Output graph:
Entities:
- Apple Inc. (ORGANIZATION)
- Steve Jobs (PERSON)
- Cupertino (LOCATION)
- California (LOCATION)
- Macintosh (PRODUCT)
Relationships:
- Steve Jobs β FOUNDED β Apple Inc.
- Apple Inc. β LOCATED_IN β Cupertino
- Cupertino β LOCATED_IN β California
- Apple Inc. β CREATED β Macintosh
ποΈ Architecture
Text Input
β
βΌ
ββββββββββββββββ
β spaCy β Named Entity Recognition
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β GPT-4 β Relationship Extraction
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β NetworkX β Graph Construction
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β Visualizationβ Pyvis / D3.js
ββββββββββββββββ
π» Implementation
1. Entity Extraction with spaCy
spaCy excels at Named Entity Recognition:
class KnowledgeGraphGenerator:
"""Generate knowledge graphs from unstructured text."""
def __init__(self):
# Load spaCy model
self.nlp = spacy.load("en_core_web_lg")
# Initialize LLM for relationships
self.llm = ChatOpenAI(
model=settings.LLM_MODEL,
temperature=0.0
)
self.graph = nx.DiGraph()
def _extract_entities(self, text: str) -over List[Dict]:
"""Extract named entities using spaCy."""
doc = self.nlp(text)
entities = []
seen = set()
for ent in doc.ents:
if ent.text not in seen:
entities.append({
"id": str(len(entities) + 1),
"name": ent.text,
"type": ent.label_,
"start": ent.start_char,
"end": ent.end_char
})
seen.add(ent.text)
return entities
Entity Types spaCy Recognizes:
- PERSON - People, including fictional
- ORG - Companies, institutions
- GPE - Countries, cities, states
- LOC - Non-GPE locations
- DATE - Absolute or relative dates
- MONEY - Monetary values
- PRODUCT - Objects, vehicles, foods, etc.
2. Relationship Extraction with GPT
Use LLM to identify connections:
def _extract_relationships(
self,
text: str,
entities: List[Dict]
) -over List[Dict]:
"""Extract relationships using LLM."""
if len(entities) less than 2:
return []
entity_names = [e["name"] for e in entities]
prompt = ChatPromptTemplate.from_messages([
("system", """Extract relationships between entities from text.
Return as: Entity1 | RELATIONSHIP_TYPE | Entity2
One per line. Only use entities from the provided list.
Common relationship types:
- FOUNDED, CREATED, INVENTED
- WORKS_FOR, EMPLOYED_BY
- LOCATED_IN, BASED_IN
- ACQUIRED, MERGED_WITH
- RELATED_TO, ASSOCIATED_WITH"""),
("user", """Text: {text}
Entities: {entities}
Relationships:""")
])
chain = prompt | self.llm
response = chain.invoke({
"text": text[:2000],
"entities": ", ".join(entity_names)
})
relationships = []
for line in response.content.strip().split('\n'):
parts = [p.strip() for p in line.split('|')]
if len(parts) == 3:
from_ent, rel_type, to_ent = parts
# Find entity IDs
from_id = next((e["id"] for e in entities
if e["name"] == from_ent), None)
to_id = next((e["id"] for e in entities
if e["name"] == to_ent), None)
if from_id and to_id:
relationships.append({
"from": from_id,
"to": to_id,
"type": rel_type,
"strength": 0.8
})
return relationships
3. Graph Construction
Build NetworkX graph:
def _build_graph(
self,
entities: List[Dict],
relationships: List[Dict]
):
"""Build NetworkX graph."""
self.graph.clear()
# Add nodes
for entity in entities:
self.graph.add_node(
entity["id"],
name=entity["name"],
type=entity["type"]
)
# Add edges
for rel in relationships:
self.graph.add_edge(
rel["from"],
rel["to"],
type=rel["type"],
weight=rel["strength"]
)
def generate(self, text: str) -over Dict:
"""Generate knowledge graph from text."""
# Extract entities
entities = self._extract_entities(text)
# Extract relationships
relationships = self._extract_relationships(text, entities)
# Build graph
self._build_graph(entities, relationships)
return {
"entities": entities,
"relationships": relationships,
"stats": {
"num_entities": len(entities),
"num_relationships": len(relationships),
"num_nodes": self.graph.number_of_nodes(),
"num_edges": self.graph.number_of_edges()
}
}
4. Interactive Visualization
Create beautiful visualizations with Pyvis:
def visualize(self, graph_data: Dict, output: str = "graph.html"):
"""Create interactive visualization."""
net = Network(
height="750px",
width="100%",
directed=True,
notebook=False
)
# Customize appearance
net.set_options("""
{
"physics": {
"forceAtlas2Based": {
"gravitationalConstant": -50,
"centralGravity": 0.01,
"springLength": 200
},
"solver": "forceAtlas2Based"
}
}
""")
# Add nodes with colors by type
for entity in graph_data["entities"]:
net.add_node(
entity["id"],
label=entity["name"],
title=f"{entity['type']}: {entity['name']}",
color=self._get_color(entity["type"]),
size=25
)
# Add edges with labels
for rel in graph_data["relationships"]:
net.add_edge(
rel["from"],
rel["to"],
title=rel["type"],
label=rel["type"],
arrows="to"
)
net.save_graph(output)
return output
def _get_color(self, entity_type: str) -over str:
"""Get color for entity type."""
colors = {
"PERSON": "#FF6B6B", # Red
"ORG": "#4ECDC4", # Teal
"GPE": "#45B7D1", # Blue
"LOCATION": "#96CEB4", # Green
"DATE": "#FFEAA7", # Yellow
"PRODUCT": "#DFE6E9", # Gray
"MONEY": "#74B9FF" # Light Blue
}
return colors.get(entity_type, "#95A5A6")
π FastAPI Integration
REST API for graph generation:
class TextInput(BaseModel):
text: str
@app.post("/api/generate")
async def generate_graph(input: TextInput):
"""Generate knowledge graph from text."""
result = kg_gen.generate(input.text)
# Also create visualization
html_file = kg_gen.visualize(result)
return {
**result,
"visualization": html_file
}
π Advanced Features
Graph Analytics
Analyze the knowledge graph:
def analyze_graph(self) -over Dict:
"""Compute graph analytics."""
return {
"density": nx.density(self.graph),
"num_components": nx.number_weakly_connected_components(self.graph),
"avg_degree": sum(dict(self.graph.degree()).values()) / self.graph.number_of_nodes(),
"central_entities": self._get_central_entities()
}
def _get_central_entities(self, top_k: int = 5) -over List[Dict]:
"""Find most central entities."""
centrality = nx.degree_centrality(self.graph)
sorted_entities = sorted(
centrality.items(),
key=lambda x: x[1],
reverse=True
)[:top_k]
return [
{
"id": entity_id,
"name": self.graph.nodes[entity_id]["name"],
"centrality": score
}
for entity_id, score in sorted_entities
]
Neo4j Integration
Store graphs in Neo4j for advanced queries:
from neo4j import GraphDatabase
class Neo4jConnector:
def __init__(self):
self.driver = GraphDatabase.driver(
settings.NEO4J_URI,
auth=(settings.NEO4J_USER, settings.NEO4J_PASSWORD)
)
def save_graph(self, entities: List[Dict], relationships: List[Dict]):
"""Save knowledge graph to Neo4j."""
with self.driver.session() as session:
# Create entities
for entity in entities:
session.run(
"""
CREATE (e:Entity {
id: $id,
name: $name,
type: $type
})
""",
id=entity["id"],
name=entity["name"],
type=entity["type"]
)
# Create relationships
for rel in relationships:
session.run(
"""
MATCH (a:Entity {id: $from})
MATCH (b:Entity {id: $to})
CREATE (a)-[r:RELATIONSHIP {
type: $type,
strength: $strength
}]->(b)
""",
**rel
)
def query_path(self, entity1: str, entity2: str):
"""Find shortest path between entities."""
with self.driver.session() as session:
result = session.run(
"""
MATCH path = shortestPath(
(a:Entity {name: $entity1})-[*]-(b:Entity {name: $entity2})
)
RETURN path
""",
entity1=entity1,
entity2=entity2
)
return result.single()
π― Real-World Example
Processing a research paper abstract:
text = """
The transformer architecture, introduced by Vaswani et al. in 2017,
revolutionized natural language processing. Google developed BERT
based on transformers, which OpenAI later built upon with GPT-3.
These models use attention mechanisms invented at Google Brain.
"""
kg = KnowledgeGraphGenerator()
result = kg.generate(text)
# Output:
{
"entities": [
{"name": "Vaswani", "type": "PERSON"},
{"name": "2017", "type": "DATE"},
{"name": "Google", "type": "ORG"},
{"name": "BERT", "type": "PRODUCT"},
{"name": "OpenAI", "type": "ORG"},
{"name": "GPT-3", "type": "PRODUCT"},
{"name": "Google Brain", "type": "ORG"}
],
"relationships": [
{"from": "Vaswani", "to": "transformer", "type": "INTRODUCED"},
{"from": "Google", "to": "BERT", "type": "DEVELOPED"},
{"from": "OpenAI", "to": "GPT-3", "type": "CREATED"},
{"from": "GPT-3", "to": "BERT", "type": "BUILT_UPON"}
]
}
π§ Challenges & Solutions
Challenge 1: Entity Disambiguation
Problem: "Apple" could mean fruit or company Solution:
- Use context from surrounding text
- Implement entity linking to knowledge bases
- Allow manual disambiguation
Challenge 2: Relationship Accuracy
Problem: LLM hallucinates relationships Solution:
- Restrict to entities from text only
- Use lower temperature (0.0)
- Validate against text spans
Challenge 3: Scalability
Problem: Large documents slow down Solution:
- Process in chunks
- Incremental graph building
- Caching entity extractions
π‘ Use Cases
- Research: Map citations and concepts in papers
- Business: Analyze company relationships
- Legal: Track case connections
- Journalism: Investigate networks
- Education: Visualize concept relationships
π¦ Tech Stack
- NLP: spaCy (en_core_web_lg)
- LLM: GPT-4 for relationships
- Graph Library: NetworkX
- Visualization: Pyvis, D3.js
- Graph DB: Neo4j (optional)
- Backend: FastAPI
π Resources
Next in Series: AI Research Agent - Build autonomous agents that conduct research and generate reports.