Knowledge Graph Entity Linking
Go beyond flat semantic search. Automatically extract named entities from every stored memory and link them into a traversable knowledge graph — enabling multi-hop reasoning your vector store simply cannot do.
Get Started Free →- Running Dakera server (Quickstart guide)
- Familiarity with graph concepts (nodes, edges, traversal depth)
- Dakera Python, TypeScript, Rust, or Go SDK installed
The Problem with Flat Memory
Vector search finds semantically similar memories but cannot reason about relationships. The query "Who works at the company providing our cloud hosting?" requires chaining three facts across separate memories: a person's employer, that employer's vendor contracts, and which vendor handles cloud. A flat vector store returns unrelated chunks. You need a graph.
Manually building and maintaining entity graphs adds weeks of infrastructure work — entity extraction pipelines, deduplication logic, relationship schemas, traversal APIs. Dakera handles all of this automatically every time you call store_memory.
Dakera runs GLiNER (a zero-shot NER model served via ONNX) on-device for every stored memory. No external API calls, no data leaving your instance. Extraction takes ~4ms per memory and supports 18 built-in entity types including Person, Organization, Location, Product, Event, and Technology.
Architecture: How the Graph Is Built
Every call to store_memory triggers a three-stage pipeline: extract entities from the content, normalize them against existing nodes (resolving "Alex Chen" and "A. Chen" to the same node), then link the new entities to existing ones using inferred relationship edges. The result is a live knowledge graph that grows with your agent's memory.
Entity Graph: CRM Agent Example
Edge Types Built Into Dakera
| Edge Type | Example | Direction |
|---|---|---|
WORKS_AT | Person → Organization | Directional |
LOCATED_IN | Organization → Location | Directional |
RELATES_TO | Project ↔ Technology | Bidirectional |
PART_OF | Feature → Product | Directional |
KNOWS | Person ↔ Person | Bidirectional |
OWNS_DEAL | Person → Deal | Directional |
EVALUATING | Person/Org → Technology | Directional |
USES_VENDOR | Deal/Project → Organization | Directional |
Entity Resolution Flow
Before a new entity is added to the graph, Dakera runs a three-step normalization pipeline to prevent duplicate nodes. "Alex Chen", "A. Chen (Acme)", and "Alex C." all resolve to the same node via fuzzy name matching and co-occurrence context.
You do not need to label entities or define schemas. Store natural language memories and Dakera extracts entities automatically. The graph grows organically as your agent learns more about the world it operates in.
Real-World Scenario: CRM Agent
A CRM agent needs to answer questions like: "What other deals does the champion on the Acme renewal know about?" or "Which contacts at AWS should we loop in given Marcus Lee's network?" These require multi-hop traversal — impossible with vector search alone.
-
Index contact and company dataStore meeting notes, email summaries, and deal updates as memories. GLiNER extracts people, companies, and deal names automatically on every store.
-
Build the relationship graph over timeAs you store "Sarah Kim joined Acme Corp as VP Sales in March" and "Sarah is leading the CloudMigrate Q2 deal", Dakera creates WORKS_AT and OWNS_DEAL edges automatically.
-
Traverse for multi-hop answersQuery the graph at depth 2 to find all deals owned by contacts at a given company, or all technologies evaluated by contacts in a given city — without writing graph traversal code yourself.
-
Combine graph results with semantic recallUse entity traversal to find the relevant agent IDs, then call
recall()for each to surface supporting memories. Graph narrows scope; recall provides context. -
Clean stale relationships periodicallyWhen contacts change companies or deals close, update or forget the affected memories. Dakera automatically re-evaluates graph edges when memories are updated or removed.
Implementation
# 1. Store memories — entity extraction and graph linking happen automatically
curl -X POST http://localhost:3300/v1/memory/store \
-H "Authorization: Bearer dk-..." \
-H "Content-Type: application/json" \
-d '{
"agent_id": "crm-agent",
"content": "Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k",
"importance": 0.9,
"memory_type": "semantic",
"tags": ["crm", "contact", "deal"]
}'
# 2. Store a second memory — Dakera links the entities across memories
curl -X POST http://localhost:3300/v1/memory/store \
-H "Authorization: Bearer dk-..." \
-H "Content-Type: application/json" \
-d '{
"agent_id": "crm-agent",
"content": "Marcus Lee is the technical champion at Acme Corp. He knows Sarah Kim and evaluates AWS for the CloudMigrate project.",
"importance": 0.85,
"tags": ["crm", "contact", "champion"]
}'
# 3. Retrieve the knowledge graph
curl http://localhost:3300/v1/agents/crm-agent/graph \
-H "Authorization: Bearer dk-..."
# 4. Traverse from a specific entity (depth 2 = 2 hops)
curl "http://localhost:3300/v1/agents/crm-agent/graph/traverse?entity=Acme+Corp&depth=2" \
-H "Authorization: Bearer dk-..."
# 5. Find the shortest path between two entities
curl "http://localhost:3300/v1/agents/crm-agent/graph/path?from=Sarah+Kim&to=AWS" \
-H "Authorization: Bearer dk-..."from dakera import DakeraClient
client = DakeraClient(base_url="http://localhost:3300", api_key="dk-...")
# Store memories — entity graph is built automatically
client.store_memory(
agent_id="crm-agent",
content="Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k",
importance=0.9,
memory_type="semantic",
tags=["crm", "contact", "deal"]
)
client.store_memory(
agent_id="crm-agent",
content="Marcus Lee is the technical champion at Acme Corp. He knows Sarah Kim and evaluates AWS for the CloudMigrate project.",
importance=0.85,
tags=["crm", "contact", "champion"]
)
client.store_memory(
agent_id="crm-agent",
content="Acme Corp is headquartered in San Francisco and has 850 employees in the fintech sector.",
importance=0.75,
tags=["crm", "account"]
)
# Retrieve the full knowledge graph
graph = client.knowledge_graph(agent_id="crm-agent")
# Returns: {"entities": [...], "relationships": [...]}
entities = graph["entities"]
# [
# {"id": "ent_001", "label": "Sarah Kim", "type": "Person", "ref_count": 2},
# {"id": "ent_002", "label": "Acme Corp", "type": "Organization", "ref_count": 3},
# {"id": "ent_003", "label": "CloudMigrate Q2", "type": "Deal", "ref_count": 2},
# ...
# ]
# Traverse from Acme Corp outward to depth 2
# Finds: Sarah Kim, Marcus Lee, CloudMigrate Q2, AWS, San Francisco
neighbors = client.graph_traverse(
agent_id="crm-agent",
entity="Acme Corp",
depth=2
)
# Find the shortest relationship path between two entities
path = client.graph_path(
agent_id="crm-agent",
from_entity="Sarah Kim",
to_entity="AWS"
)
# path = ["Sarah Kim", -OWNS_DEAL-> "CloudMigrate Q2", -USES_VENDOR-> "AWS"]
# Multi-hop question answering:
# "Who at Acme Corp knows about AWS?"
acme_neighbors = client.graph_traverse(
agent_id="crm-agent", entity="Acme Corp", depth=1
)
# Filter for persons who also EVALUATING AWS
for entity in acme_neighbors["entities"]:
if entity["type"] == "Person":
person_graph = client.graph_traverse(
agent_id="crm-agent", entity=entity["label"], depth=1
)
aws_links = [e for e in person_graph["entities"] if e["label"] == "AWS"]
if aws_links:
print(f"{entity['label']} is evaluating AWS")import { DakeraClient } from '@dakera-ai/dakera';
const client = new DakeraClient({ baseUrl: 'http://localhost:3300', apiKey: 'dk-...' });
// Store contact and deal memories — entity linking happens automatically
await client.storeMemory('crm-agent', {
content: 'Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k',
importance: 0.9,
memoryType: 'semantic',
tags: ['crm', 'contact', 'deal']
});
await client.storeMemory('crm-agent', {
content: 'Marcus Lee is the technical champion at Acme Corp. He knows Sarah Kim and evaluates AWS.',
importance: 0.85,
tags: ['crm', 'contact', 'champion']
});
await client.storeMemory('crm-agent', {
content: 'Acme Corp is headquartered in San Francisco, fintech sector, 850 employees.',
importance: 0.75,
tags: ['crm', 'account']
});
// Retrieve the full knowledge graph
const graph = await client.knowledgeGraph({ agentId: 'crm-agent' });
// { entities: [...], relationships: [...] }
// Traverse: who is connected to Acme Corp within 2 hops?
const neighbors = await client.graphTraverse({
agentId: 'crm-agent',
entity: 'Acme Corp',
depth: 2
});
// Find shortest path between two entities
const path = await client.graphPath({
agentId: 'crm-agent',
fromEntity: 'Sarah Kim',
toEntity: 'AWS'
});
// Build context for LLM from graph + semantic recall combined
async function answerWithGraph(query: string) {
// Step 1: Extract entities from the query
const queryEntities = await client.extractEntities(query);
// Step 2: For each entity, traverse the graph
const graphContext: string[] = [];
for (const entity of queryEntities) {
const subgraph = await client.graphTraverse({
agentId: 'crm-agent',
entity: entity.label,
depth: 2
});
graphContext.push(...subgraph.entities.map((e: any) => e.label));
}
// Step 3: Use semantic recall for supporting memories
const memories = await client.recall('crm-agent', query, { top_k: 6, min_importance: 0.6 });
return {
graphEntities: graphContext,
semanticMemories: memories.memories.map((m: any) => m.content)
};
}use dakera_rs::{Client, StoreMemoryRequest, RecallRequest};
let client = Client::new("http://localhost:3300", "dk-...");
// Store memories — GLiNER extracts entities automatically
client.store_memory("crm-agent", StoreMemoryRequest {
content: "Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k".into(),
importance: Some(0.9),
memory_type: Some("semantic".into()),
tags: Some(vec!["crm".into(), "contact".into(), "deal".into()]),
..Default::default()
}).await?;
client.store_memory("crm-agent", StoreMemoryRequest {
content: "Marcus Lee is the technical champion at Acme Corp evaluating AWS for CloudMigrate.".into(),
importance: Some(0.85),
tags: Some(vec!["crm".into(), "champion".into()]),
..Default::default()
}).await?;
// Knowledge graph and traversal via REST (not yet in Rust SDK)
// GET /v1/agents/crm-agent/graph
// GET /v1/agents/crm-agent/graph/traverse?entity=Acme+Corp&depth=2
// Semantic recall works fully from Rust SDK
let memories = client.recall("crm-agent", RecallRequest {
query: "Who at Acme Corp is evaluating cloud vendors?".into(),
top_k: Some(5),
min_importance: Some(0.6),
..Default::default()
}).await?;
for m in &memories.memories {
println!("{} (score: {:.2})", m.content, m.score);
}package main
import (
"context"
"fmt"
dakera "github.com/dakera-ai/dakera-go"
)
func main() {
client := dakera.NewClient("http://localhost:3300", "dk-...")
ctx := context.Background()
// Store CRM memories — entities extracted and linked automatically
client.StoreMemory(ctx, "crm-agent", dakera.StoreMemoryRequest{
Content: "Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k",
Importance: 0.9,
MemoryType: "semantic",
Tags: []string{"crm", "contact", "deal"},
})
client.StoreMemory(ctx, "crm-agent", dakera.StoreMemoryRequest{
Content: "Marcus Lee is technical champion at Acme Corp, knows Sarah Kim, evaluates AWS.",
Importance: 0.85,
Tags: []string{"crm", "champion"},
})
// Traverse graph from Acme Corp (depth 2)
neighbors, err := client.GraphTraverse(ctx, dakera.GraphTraverseRequest{
AgentID: "crm-agent",
Entity: "Acme Corp",
Depth: 2,
})
if err != nil {
panic(err)
}
for _, entity := range neighbors.Entities {
fmt.Printf("Entity: %s (%s)
", entity.Label, entity.Type)
}
// Recall with semantic search
results, _ := client.Recall(ctx, "crm-agent", dakera.RecallRequest{
Query: "Who evaluates cloud vendors at Acme Corp?",
TopK: 5,
MinImportance: 0.6,
})
for _, m := range results.Memories {
fmt.Printf("%.2f: %s
", m.Score, m.Content)
}
}Build your first entity graph in 10 minutes
Dakera's one-line Docker install has you storing memories and querying graphs immediately.
Before / After: Flat Memory vs. Graph-Linked Memory
# Three disconnected memory chunks
# Vector search can find each individually
# but cannot answer relationship questions
Memory 1:
"Sarah Kim is VP Sales at Acme Corp"
Memory 2:
"Marcus Lee evaluates AWS for Acme"
Memory 3:
"CloudMigrate Q2 deal is $240k"
# Query: "Who at Acme knows about AWS?"
# Result: memory 2 returned by similarity
# But: no link to Sarah Kim, no deal context
# Agent cannot chain: Sarah → Acme → Marcus → AWS
# Same three memories, now graph-linked
# Entities auto-extracted on every store
Nodes: Sarah Kim (Person), Marcus Lee (Person),
Acme Corp (Org), AWS (Tech),
CloudMigrate Q2 (Deal)
Edges:
Sarah Kim -WORKS_AT-> Acme Corp
Sarah Kim -OWNS_DEAL-> CloudMigrate Q2
Marcus Lee -WORKS_AT-> Acme Corp
Marcus Lee -KNOWS-> Sarah Kim
Marcus Lee -EVALUATING-> AWS
CloudMigrate Q2 -USES_VENDOR-> AWS
# Query: "Who at Acme knows about AWS?"
# Traverse: Acme Corp (depth=2)
# → Sarah Kim, Marcus Lee
# → Filter EVALUATING AWS → Marcus Lee
# + KNOWS edge → also surfaces Sarah Kim
# Full answer with relationship context
Performance Considerations
Entity extraction and graph operations are optimized to add minimal overhead to the memory storage path. In production CRM deployments with 50k+ memories, typical numbers look like:
At depth=3 and beyond on dense graphs (100k+ nodes), traversal can return thousands of entities and take 40ms+. Keep depth at 2 for most queries. Use entity filtering parameters to scope traversal to specific edge types or entity types when querying large graphs.
Throughput benchmarks (single Dakera instance, cpx21)
| Operation | p50 | p99 | Notes |
|---|---|---|---|
| store_memory (with entity extraction) | 18ms | 34ms | Includes GLiNER + graph write |
| graph_traverse depth=1 | 3ms | 9ms | Direct neighbors only |
| graph_traverse depth=2 | 7ms | 22ms | Recommended maximum for UX |
| graph_path (shortest path) | 12ms | 40ms | BFS across full graph |
| knowledge_graph (full export) | 85ms | 240ms | Avoid in hot path; use for analytics |
Edge Cases
1. Entity Disambiguation
Two different people named "Alex Chen" working at different companies will be correctly separated if their memories include disambiguating context (employer, location, role). Without context, Dakera may merge them. Solution: always include organizational context in memories containing person names.
# Ambiguous — may merge two different Alex Chens
client.store_memory("crm-agent", content="Alex Chen called today", importance=0.5)
# Unambiguous — includes employer context for correct node separation
client.store_memory("crm-agent",
content="Alex Chen (Acme Corp, VP Eng) reviewed the proposal",
importance=0.8, tags=["contact", "acme"]
)
client.store_memory("crm-agent",
content="Alex Chen (BetaCo, Director of Finance) declined the meeting",
importance=0.8, tags=["contact", "betaco"]
)
2. Merging Duplicate Nodes
Aliases like "AWS", "Amazon Web Services", and "Amazon AWS" are common in CRM data. Dakera normalizes known technology and organization names via a built-in alias dictionary. For domain-specific aliases not in the dictionary, tag memories with a canonical entity ID so Dakera can force-merge during normalization.
3. Stale Relationship Cleanup
When a contact changes companies, old WORKS_AT edges become stale. Update the affected memories to trigger edge re-evaluation, or use forget() to remove stale memories entirely. Dakera prunes orphaned graph edges automatically when the source memory is deleted.
# Update memory when Sarah Kim changes employer
client.update_memory(
agent_id="crm-agent",
memory_id="mem_sarah_employer",
content="Sarah Kim is VP Sales at NewCo (formerly Acme Corp)",
importance=0.9
)
# Dakera re-runs entity extraction and updates graph edges automatically
4. Cross-Agent Entity Sharing
Entities extracted by one agent are scoped to that agent's namespace. If two agents (e.g., a research agent and a CRM agent) both know about "Acme Corp", they maintain separate graph nodes. Use a shared namespace to publish canonical entity knowledge that multiple agents can build upon.
5. High-Cardinality Entity Types
Storing thousands of memories mentioning many unique transaction IDs, ticket numbers, or timestamps will create very large graphs with low-value nodes. Filter these with importance thresholds: only store memories with importance >= 0.6 for graph-indexed entities, and use ephemeral memories (with ttl_seconds) for transactional data.
SDK Reference
| Method | SDK | Purpose |
|---|---|---|
store_memory(agent_id, content, importance, memory_type, tags) | Python | Store memory; triggers entity extraction and graph linking |
storeMemory(agentId, {content, importance, memoryType, tags}) | TypeScript | Store memory with entity auto-extraction |
store_memory("agent", StoreMemoryRequest{...}).await? | Rust | Store memory (entity extraction included server-side) |
StoreMemory(ctx, "agent", StoreMemoryRequest{...}) | Go | Store memory with entity linking |
client.knowledge_graph(agent_id) | Python | Export full entity graph for an agent |
client.knowledgeGraph({agentId}) | TypeScript | Export full entity graph for an agent |
client.graph_traverse(agent_id, entity, depth) | Python | BFS traversal from a named entity |
client.graphTraverse({agentId, entity, depth}) | TypeScript | BFS traversal from a named entity |
client.graph_path(agent_id, from_entity, to_entity) | Python | Shortest relationship path between two entities |
client.recall(agent_id, query, top_k, min_importance) | Python | Semantic recall — use alongside graph traversal |
client.forget(agent_id, memory_id) | Python | Remove memory; triggers graph edge cleanup |
client.update_memory(agent_id, memory_id, ...) | Python | Update memory content; re-runs entity extraction |
Advanced Configuration: Tuning Entity Extraction
Custom entity types
Beyond the 18 built-in types, you can configure Dakera to extract domain-specific entities. Add custom types to your server config:
# docker/.env or server config
DAKERA_ENTITY_TYPES=Person,Organization,Location,Technology,Deal,Product,Feature,Contract
DAKERA_ENTITY_MIN_CONFIDENCE=0.72 # default: 0.70
DAKERA_GRAPH_MAX_DEPTH=4 # max traversal depth allowed by API
DAKERA_GRAPH_NORMALIZE_THRESHOLD=0.85 # fuzzy match score for alias merging
Importance-based graph indexing
Entity extraction runs on all memories by default. To skip graph indexing for low-importance ephemeral data:
DAKERA_GRAPH_MIN_IMPORTANCE=0.5 # only index entities from memories >= 0.5 importance
Traversal edge-type filtering
Scope traversal to specific relationship types to reduce result size:
# REST: filter to only WORKS_AT and KNOWS edges
curl "http://localhost:3300/v1/agents/crm-agent/graph/traverse?entity=Acme+Corp&depth=2&edge_types=WORKS_AT,KNOWS" \
-H "Authorization: Bearer dk-..."
When to Use This Pattern
- CRM systems tracking contacts, companies, and deal history across hundreds of conversations
- Research assistants connecting papers, authors, institutions, and concepts
- Legal or compliance agents reasoning over parties, contracts, and regulatory relationships
- Any application where multi-hop questions ("who knows who in X organization?") must be answered accurately
- Organizational knowledge maps built from unstructured meeting notes and emails
Graph traversal narrows scope; semantic recall provides context. The recommended pattern: traverse to find the right entities, then call recall() filtered to those entities to retrieve the supporting memories. You get structured relationship reasoning plus natural language context in one pipeline.
Ready to link your agent's knowledge?
Dakera is open-core and self-hosted. SDKs are MIT-licensed. One Docker command gets you a full memory and knowledge graph engine in your infrastructure.
Deploy Dakera Free →