Advanced Retrieval

Knowledge Graph Entity Linking

~45 min to implement 📦 Requires: Dakera v0.11+

Go beyond flat semantic search. Automatically extract named entities from every stored memory and link them into a traversable knowledge graph — enabling multi-hop reasoning your vector store simply cannot do.

Get Started Free →
Prerequisites
  • Running Dakera server (Quickstart guide)
  • Familiarity with graph concepts (nodes, edges, traversal depth)
  • Dakera Python, TypeScript, Rust, or Go SDK installed

The Problem with Flat Memory

Vector search finds semantically similar memories but cannot reason about relationships. The query "Who works at the company providing our cloud hosting?" requires chaining three facts across separate memories: a person's employer, that employer's vendor contracts, and which vendor handles cloud. A flat vector store returns unrelated chunks. You need a graph.

Manually building and maintaining entity graphs adds weeks of infrastructure work — entity extraction pipelines, deduplication logic, relationship schemas, traversal APIs. Dakera handles all of this automatically every time you call store_memory.

How Dakera extracts entities

Dakera runs GLiNER (a zero-shot NER model served via ONNX) on-device for every stored memory. No external API calls, no data leaving your instance. Extraction takes ~4ms per memory and supports 18 built-in entity types including Person, Organization, Location, Product, Event, and Technology.

Architecture: How the Graph Is Built

Every call to store_memory triggers a three-stage pipeline: extract entities from the content, normalize them against existing nodes (resolving "Alex Chen" and "A. Chen" to the same node), then link the new entities to existing ones using inferred relationship edges. The result is a live knowledge graph that grows with your agent's memory.

Entity Graph: CRM Agent Example

Sarah Kim Person · VP Sales Acme Corp Organization CloudMigrate Q2 Deal · $240k AWS Technology / Vendor Marcus Lee Person · Champion San Francisco Location WORKS_AT OWNS_DEAL EVALUATING KNOWS LOCATED_IN USES_VENDOR WORKS_AT Primary subject Linked entity Inferred edge Cross-entity edge (depth 2)

Edge Types Built Into Dakera

Edge TypeExampleDirection
WORKS_ATPerson → OrganizationDirectional
LOCATED_INOrganization → LocationDirectional
RELATES_TOProject ↔ TechnologyBidirectional
PART_OFFeature → ProductDirectional
KNOWSPerson ↔ PersonBidirectional
OWNS_DEALPerson → DealDirectional
EVALUATINGPerson/Org → TechnologyDirectional
USES_VENDORDeal/Project → OrganizationDirectional

Entity Resolution Flow

Before a new entity is added to the graph, Dakera runs a three-step normalization pipeline to prevent duplicate nodes. "Alex Chen", "A. Chen (Acme)", and "Alex C." all resolve to the same node via fuzzy name matching and co-occurrence context.

1. Extract GLiNER NER ~4ms / memory 2. Normalize Fuzzy name match Co-occurrence context Alias resolution Node exists? YES 3a. Merge Update node attrs Increment refs NO 3b. Create new node Assign canonical ID + type 4. Link Infer edge type Write to graph store Entity Resolution Pipeline — runs on every store_memory() call
Tip: Entity extraction is zero-config

You do not need to label entities or define schemas. Store natural language memories and Dakera extracts entities automatically. The graph grows organically as your agent learns more about the world it operates in.

Real-World Scenario: CRM Agent

A CRM agent needs to answer questions like: "What other deals does the champion on the Acme renewal know about?" or "Which contacts at AWS should we loop in given Marcus Lee's network?" These require multi-hop traversal — impossible with vector search alone.

  1. Index contact and company data
    Store meeting notes, email summaries, and deal updates as memories. GLiNER extracts people, companies, and deal names automatically on every store.
  2. Build the relationship graph over time
    As you store "Sarah Kim joined Acme Corp as VP Sales in March" and "Sarah is leading the CloudMigrate Q2 deal", Dakera creates WORKS_AT and OWNS_DEAL edges automatically.
  3. Traverse for multi-hop answers
    Query the graph at depth 2 to find all deals owned by contacts at a given company, or all technologies evaluated by contacts in a given city — without writing graph traversal code yourself.
  4. Combine graph results with semantic recall
    Use entity traversal to find the relevant agent IDs, then call recall() for each to surface supporting memories. Graph narrows scope; recall provides context.
  5. Clean stale relationships periodically
    When contacts change companies or deals close, update or forget the affected memories. Dakera automatically re-evaluates graph edges when memories are updated or removed.

Implementation

# 1. Store memories — entity extraction and graph linking happen automatically
curl -X POST http://localhost:3300/v1/memory/store \
  -H "Authorization: Bearer dk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "crm-agent",
    "content": "Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k",
    "importance": 0.9,
    "memory_type": "semantic",
    "tags": ["crm", "contact", "deal"]
  }'

# 2. Store a second memory — Dakera links the entities across memories
curl -X POST http://localhost:3300/v1/memory/store \
  -H "Authorization: Bearer dk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "crm-agent",
    "content": "Marcus Lee is the technical champion at Acme Corp. He knows Sarah Kim and evaluates AWS for the CloudMigrate project.",
    "importance": 0.85,
    "tags": ["crm", "contact", "champion"]
  }'

# 3. Retrieve the knowledge graph
curl http://localhost:3300/v1/agents/crm-agent/graph \
  -H "Authorization: Bearer dk-..."

# 4. Traverse from a specific entity (depth 2 = 2 hops)
curl "http://localhost:3300/v1/agents/crm-agent/graph/traverse?entity=Acme+Corp&depth=2" \
  -H "Authorization: Bearer dk-..."

# 5. Find the shortest path between two entities
curl "http://localhost:3300/v1/agents/crm-agent/graph/path?from=Sarah+Kim&to=AWS" \
  -H "Authorization: Bearer dk-..."
from dakera import DakeraClient

client = DakeraClient(base_url="http://localhost:3300", api_key="dk-...")

# Store memories — entity graph is built automatically
client.store_memory(
    agent_id="crm-agent",
    content="Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k",
    importance=0.9,
    memory_type="semantic",
    tags=["crm", "contact", "deal"]
)

client.store_memory(
    agent_id="crm-agent",
    content="Marcus Lee is the technical champion at Acme Corp. He knows Sarah Kim and evaluates AWS for the CloudMigrate project.",
    importance=0.85,
    tags=["crm", "contact", "champion"]
)

client.store_memory(
    agent_id="crm-agent",
    content="Acme Corp is headquartered in San Francisco and has 850 employees in the fintech sector.",
    importance=0.75,
    tags=["crm", "account"]
)

# Retrieve the full knowledge graph
graph = client.knowledge_graph(agent_id="crm-agent")
# Returns: {"entities": [...], "relationships": [...]}

entities = graph["entities"]
# [
#   {"id": "ent_001", "label": "Sarah Kim", "type": "Person", "ref_count": 2},
#   {"id": "ent_002", "label": "Acme Corp", "type": "Organization", "ref_count": 3},
#   {"id": "ent_003", "label": "CloudMigrate Q2", "type": "Deal", "ref_count": 2},
#   ...
# ]

# Traverse from Acme Corp outward to depth 2
# Finds: Sarah Kim, Marcus Lee, CloudMigrate Q2, AWS, San Francisco
neighbors = client.graph_traverse(
    agent_id="crm-agent",
    entity="Acme Corp",
    depth=2
)

# Find the shortest relationship path between two entities
path = client.graph_path(
    agent_id="crm-agent",
    from_entity="Sarah Kim",
    to_entity="AWS"
)
# path = ["Sarah Kim", -OWNS_DEAL-> "CloudMigrate Q2", -USES_VENDOR-> "AWS"]

# Multi-hop question answering:
# "Who at Acme Corp knows about AWS?"
acme_neighbors = client.graph_traverse(
    agent_id="crm-agent", entity="Acme Corp", depth=1
)
# Filter for persons who also EVALUATING AWS
for entity in acme_neighbors["entities"]:
    if entity["type"] == "Person":
        person_graph = client.graph_traverse(
            agent_id="crm-agent", entity=entity["label"], depth=1
        )
        aws_links = [e for e in person_graph["entities"] if e["label"] == "AWS"]
        if aws_links:
            print(f"{entity['label']} is evaluating AWS")
import { DakeraClient } from '@dakera-ai/dakera';

const client = new DakeraClient({ baseUrl: 'http://localhost:3300', apiKey: 'dk-...' });

// Store contact and deal memories — entity linking happens automatically
await client.storeMemory('crm-agent', {
  content: 'Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k',
  importance: 0.9,
  memoryType: 'semantic',
  tags: ['crm', 'contact', 'deal']
});

await client.storeMemory('crm-agent', {
  content: 'Marcus Lee is the technical champion at Acme Corp. He knows Sarah Kim and evaluates AWS.',
  importance: 0.85,
  tags: ['crm', 'contact', 'champion']
});

await client.storeMemory('crm-agent', {
  content: 'Acme Corp is headquartered in San Francisco, fintech sector, 850 employees.',
  importance: 0.75,
  tags: ['crm', 'account']
});

// Retrieve the full knowledge graph
const graph = await client.knowledgeGraph({ agentId: 'crm-agent' });
// { entities: [...], relationships: [...] }

// Traverse: who is connected to Acme Corp within 2 hops?
const neighbors = await client.graphTraverse({
  agentId: 'crm-agent',
  entity: 'Acme Corp',
  depth: 2
});

// Find shortest path between two entities
const path = await client.graphPath({
  agentId: 'crm-agent',
  fromEntity: 'Sarah Kim',
  toEntity: 'AWS'
});

// Build context for LLM from graph + semantic recall combined
async function answerWithGraph(query: string) {
  // Step 1: Extract entities from the query
  const queryEntities = await client.extractEntities(query);

  // Step 2: For each entity, traverse the graph
  const graphContext: string[] = [];
  for (const entity of queryEntities) {
    const subgraph = await client.graphTraverse({
      agentId: 'crm-agent',
      entity: entity.label,
      depth: 2
    });
    graphContext.push(...subgraph.entities.map((e: any) => e.label));
  }

  // Step 3: Use semantic recall for supporting memories
  const memories = await client.recall('crm-agent', query, { top_k: 6, min_importance: 0.6 });

  return {
    graphEntities: graphContext,
    semanticMemories: memories.memories.map((m: any) => m.content)
  };
}
use dakera_rs::{Client, StoreMemoryRequest, RecallRequest};

let client = Client::new("http://localhost:3300", "dk-...");

// Store memories — GLiNER extracts entities automatically
client.store_memory("crm-agent", StoreMemoryRequest {
    content: "Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k".into(),
    importance: Some(0.9),
    memory_type: Some("semantic".into()),
    tags: Some(vec!["crm".into(), "contact".into(), "deal".into()]),
    ..Default::default()
}).await?;

client.store_memory("crm-agent", StoreMemoryRequest {
    content: "Marcus Lee is the technical champion at Acme Corp evaluating AWS for CloudMigrate.".into(),
    importance: Some(0.85),
    tags: Some(vec!["crm".into(), "champion".into()]),
    ..Default::default()
}).await?;

// Knowledge graph and traversal via REST (not yet in Rust SDK)
// GET /v1/agents/crm-agent/graph
// GET /v1/agents/crm-agent/graph/traverse?entity=Acme+Corp&depth=2

// Semantic recall works fully from Rust SDK
let memories = client.recall("crm-agent", RecallRequest {
    query: "Who at Acme Corp is evaluating cloud vendors?".into(),
    top_k: Some(5),
    min_importance: Some(0.6),
    ..Default::default()
}).await?;

for m in &memories.memories {
    println!("{} (score: {:.2})", m.content, m.score);
}
package main

import (
    "context"
    "fmt"
    dakera "github.com/dakera-ai/dakera-go"
)

func main() {
    client := dakera.NewClient("http://localhost:3300", "dk-...")
    ctx := context.Background()

    // Store CRM memories — entities extracted and linked automatically
    client.StoreMemory(ctx, "crm-agent", dakera.StoreMemoryRequest{
        Content:    "Sarah Kim is VP Sales at Acme Corp and owns the CloudMigrate Q2 deal worth $240k",
        Importance: 0.9,
        MemoryType: "semantic",
        Tags:       []string{"crm", "contact", "deal"},
    })

    client.StoreMemory(ctx, "crm-agent", dakera.StoreMemoryRequest{
        Content:    "Marcus Lee is technical champion at Acme Corp, knows Sarah Kim, evaluates AWS.",
        Importance: 0.85,
        Tags:       []string{"crm", "champion"},
    })

    // Traverse graph from Acme Corp (depth 2)
    neighbors, err := client.GraphTraverse(ctx, dakera.GraphTraverseRequest{
        AgentID: "crm-agent",
        Entity:  "Acme Corp",
        Depth:   2,
    })
    if err != nil {
        panic(err)
    }

    for _, entity := range neighbors.Entities {
        fmt.Printf("Entity: %s (%s)
", entity.Label, entity.Type)
    }

    // Recall with semantic search
    results, _ := client.Recall(ctx, "crm-agent", dakera.RecallRequest{
        Query:         "Who evaluates cloud vendors at Acme Corp?",
        TopK:          5,
        MinImportance: 0.6,
    })

    for _, m := range results.Memories {
        fmt.Printf("%.2f: %s
", m.Score, m.Content)
    }
}

Build your first entity graph in 10 minutes

Dakera's one-line Docker install has you storing memories and querying graphs immediately.

Start Free →

Before / After: Flat Memory vs. Graph-Linked Memory

Before: Flat memory store
# Three disconnected memory chunks
# Vector search can find each individually
# but cannot answer relationship questions

Memory 1:
"Sarah Kim is VP Sales at Acme Corp"

Memory 2:
"Marcus Lee evaluates AWS for Acme"

Memory 3:
"CloudMigrate Q2 deal is $240k"

# Query: "Who at Acme knows about AWS?"
# Result: memory 2 returned by similarity
# But: no link to Sarah Kim, no deal context
# Agent cannot chain: Sarah → Acme → Marcus → AWS
After: Graph-linked memory
# Same three memories, now graph-linked
# Entities auto-extracted on every store

Nodes: Sarah Kim (Person), Marcus Lee (Person),
       Acme Corp (Org), AWS (Tech),
       CloudMigrate Q2 (Deal)

Edges:
Sarah Kim  -WORKS_AT->    Acme Corp
Sarah Kim  -OWNS_DEAL->   CloudMigrate Q2
Marcus Lee -WORKS_AT->    Acme Corp
Marcus Lee -KNOWS->       Sarah Kim
Marcus Lee -EVALUATING->  AWS
CloudMigrate Q2 -USES_VENDOR-> AWS

# Query: "Who at Acme knows about AWS?"
# Traverse: Acme Corp (depth=2)
# → Sarah Kim, Marcus Lee
# → Filter EVALUATING AWS → Marcus Lee
# + KNOWS edge → also surfaces Sarah Kim
# Full answer with relationship context

Performance Considerations

Entity extraction and graph operations are optimized to add minimal overhead to the memory storage path. In production CRM deployments with 50k+ memories, typical numbers look like:

~4ms
GLiNER entity extraction per memory
<2ms
Node normalization + graph write
<8ms
Graph traversal depth=2 (10k nodes)
Watch: traversal depth at scale

At depth=3 and beyond on dense graphs (100k+ nodes), traversal can return thousands of entities and take 40ms+. Keep depth at 2 for most queries. Use entity filtering parameters to scope traversal to specific edge types or entity types when querying large graphs.

Throughput benchmarks (single Dakera instance, cpx21)

Operationp50p99Notes
store_memory (with entity extraction)18ms34msIncludes GLiNER + graph write
graph_traverse depth=13ms9msDirect neighbors only
graph_traverse depth=27ms22msRecommended maximum for UX
graph_path (shortest path)12ms40msBFS across full graph
knowledge_graph (full export)85ms240msAvoid in hot path; use for analytics

Edge Cases

1. Entity Disambiguation

Two different people named "Alex Chen" working at different companies will be correctly separated if their memories include disambiguating context (employer, location, role). Without context, Dakera may merge them. Solution: always include organizational context in memories containing person names.

# Ambiguous — may merge two different Alex Chens
client.store_memory("crm-agent", content="Alex Chen called today", importance=0.5)

# Unambiguous — includes employer context for correct node separation
client.store_memory("crm-agent",
    content="Alex Chen (Acme Corp, VP Eng) reviewed the proposal",
    importance=0.8, tags=["contact", "acme"]
)
client.store_memory("crm-agent",
    content="Alex Chen (BetaCo, Director of Finance) declined the meeting",
    importance=0.8, tags=["contact", "betaco"]
)

2. Merging Duplicate Nodes

Aliases like "AWS", "Amazon Web Services", and "Amazon AWS" are common in CRM data. Dakera normalizes known technology and organization names via a built-in alias dictionary. For domain-specific aliases not in the dictionary, tag memories with a canonical entity ID so Dakera can force-merge during normalization.

3. Stale Relationship Cleanup

When a contact changes companies, old WORKS_AT edges become stale. Update the affected memories to trigger edge re-evaluation, or use forget() to remove stale memories entirely. Dakera prunes orphaned graph edges automatically when the source memory is deleted.

# Update memory when Sarah Kim changes employer
client.update_memory(
    agent_id="crm-agent",
    memory_id="mem_sarah_employer",
    content="Sarah Kim is VP Sales at NewCo (formerly Acme Corp)",
    importance=0.9
)
# Dakera re-runs entity extraction and updates graph edges automatically

4. Cross-Agent Entity Sharing

Entities extracted by one agent are scoped to that agent's namespace. If two agents (e.g., a research agent and a CRM agent) both know about "Acme Corp", they maintain separate graph nodes. Use a shared namespace to publish canonical entity knowledge that multiple agents can build upon.

5. High-Cardinality Entity Types

Storing thousands of memories mentioning many unique transaction IDs, ticket numbers, or timestamps will create very large graphs with low-value nodes. Filter these with importance thresholds: only store memories with importance >= 0.6 for graph-indexed entities, and use ephemeral memories (with ttl_seconds) for transactional data.

SDK Reference

MethodSDKPurpose
store_memory(agent_id, content, importance, memory_type, tags)PythonStore memory; triggers entity extraction and graph linking
storeMemory(agentId, {content, importance, memoryType, tags})TypeScriptStore memory with entity auto-extraction
store_memory("agent", StoreMemoryRequest{...}).await?RustStore memory (entity extraction included server-side)
StoreMemory(ctx, "agent", StoreMemoryRequest{...})GoStore memory with entity linking
client.knowledge_graph(agent_id)PythonExport full entity graph for an agent
client.knowledgeGraph({agentId})TypeScriptExport full entity graph for an agent
client.graph_traverse(agent_id, entity, depth)PythonBFS traversal from a named entity
client.graphTraverse({agentId, entity, depth})TypeScriptBFS traversal from a named entity
client.graph_path(agent_id, from_entity, to_entity)PythonShortest relationship path between two entities
client.recall(agent_id, query, top_k, min_importance)PythonSemantic recall — use alongside graph traversal
client.forget(agent_id, memory_id)PythonRemove memory; triggers graph edge cleanup
client.update_memory(agent_id, memory_id, ...)PythonUpdate memory content; re-runs entity extraction
Advanced Configuration: Tuning Entity Extraction

Custom entity types

Beyond the 18 built-in types, you can configure Dakera to extract domain-specific entities. Add custom types to your server config:

# docker/.env or server config
DAKERA_ENTITY_TYPES=Person,Organization,Location,Technology,Deal,Product,Feature,Contract
DAKERA_ENTITY_MIN_CONFIDENCE=0.72   # default: 0.70
DAKERA_GRAPH_MAX_DEPTH=4            # max traversal depth allowed by API
DAKERA_GRAPH_NORMALIZE_THRESHOLD=0.85  # fuzzy match score for alias merging

Importance-based graph indexing

Entity extraction runs on all memories by default. To skip graph indexing for low-importance ephemeral data:

DAKERA_GRAPH_MIN_IMPORTANCE=0.5   # only index entities from memories >= 0.5 importance

Traversal edge-type filtering

Scope traversal to specific relationship types to reduce result size:

# REST: filter to only WORKS_AT and KNOWS edges
curl "http://localhost:3300/v1/agents/crm-agent/graph/traverse?entity=Acme+Corp&depth=2&edge_types=WORKS_AT,KNOWS" \
  -H "Authorization: Bearer dk-..."

When to Use This Pattern

  • CRM systems tracking contacts, companies, and deal history across hundreds of conversations
  • Research assistants connecting papers, authors, institutions, and concepts
  • Legal or compliance agents reasoning over parties, contracts, and regulatory relationships
  • Any application where multi-hop questions ("who knows who in X organization?") must be answered accurately
  • Organizational knowledge maps built from unstructured meeting notes and emails
Graph + recall = best of both worlds

Graph traversal narrows scope; semantic recall provides context. The recommended pattern: traverse to find the right entities, then call recall() filtered to those entities to retrieve the supporting memories. You get structured relationship reasoning plus natural language context in one pipeline.

Ready to link your agent's knowledge?

Dakera is open-core and self-hosted. SDKs are MIT-licensed. One Docker command gets you a full memory and knowledge graph engine in your infrastructure.

Deploy Dakera Free →