Intermediate Lifecycle

Semantic Deduplication

⏰ ~20 min to implement 📦 Requires: Dakera v0.11+

Automatically detect near-duplicate memories using embedding similarity and merge them into single canonical entries. Eliminate recall noise from repeated ingestion, multi-source pipelines, and long-running agents that revisit the same facts.

Start Free →
Prerequisites
  • Running Dakera server (Quickstart)
  • An agent ID to scope memories to
  • Understanding of cosine similarity thresholds (0.85+ = near-duplicate)

The Problem: Redundant Memories Degrade Recall Quality

A research assistant agent processes 50 papers per day. Multiple papers state the same findings in different words. A user preference mentioned in Monday's chat appears again in Thursday's follow-up. A multi-agent pipeline where three agents independently discover and store the same fact creates three near-identical memories.

The result: a recall query for "user's Python preference" returns five nearly identical memories, each taking up a retrieval slot. Diversity of results collapses. The agent surfaces the same information five times instead of five distinct, relevant facts. Deduplication restores recall diversity by ensuring one concept maps to one memory.

Semantic vs. Exact Deduplication

Dakera performs semantic deduplication — it detects near-duplicates based on embedding cosine similarity, not string matching. "User prefers Python" and "User codes primarily in Python" are different strings but semantically identical. The default similarity threshold is 0.87 (configurable). Memories below the threshold are considered distinct and kept separately.

Architecture

Deduplication works in two stages. First, Dakera computes pairwise cosine similarity between memory embeddings for the agent. Pairs above the similarity threshold (default: 0.87) are identified as duplicate candidates. Second, candidates are merged into a single canonical memory — retaining the highest importance score, the most recent timestamp, and the combined semantic signal from the merged embeddings.

  • Call deduplicate() periodically (after batch ingestion or on a schedule)
  • Merged canonical memory retains the highest importance from all candidates
  • Original duplicate memories are removed from the store
  • Recall diversity improves immediately — one concept, one slot
  • Storage costs decrease proportionally to deduplication rate

Diagram: Similarity Matrix Heatmap

M1: "likes Python" M2: "uses Python" M3: "prefers Python" M4: "enjoys hiking" M1 M2 M3 M4 1.00 0.93 0.91 0.14 0.93 1.00 0.89 0.11 0.91 0.89 1.00 0.09 0.14 0.11 0.09 1.00 DUPLICATE CLUSTER (sim > 0.87) High similarity (near-dup) Low similarity (distinct) Dedup target cluster Similarity Matrix cosine similarity between embeddings

Diagram: Dedup Merge Flow

"User likes Python" imp=0.70 "User uses Python" imp=0.80 "User prefers Python" imp=0.75 embed + compare Similarity Check threshold: 0.87 all > 0.87 ✓ merge CANONICAL MEMORY importance: 0.80 "User is a Python developer" tags merged from all 3 2 duplicates removed from store Recall Result 1 slot, not 3 diversity restored 1. Candidate memories (stored over time) 2. Embedding comparison 3. Merge to canonical / discard duplicates 4. Clean recall

Real-World Scenario: Research Assistant Deduplicating Paper Summaries

Scenario: PaperMind AI builds a research assistant that ingests academic paper abstracts and stores one summary memory per paper. Researchers submit the same landmark papers from multiple sources (arxiv, Semantic Scholar, direct PDF upload). Over 3 months, the agent accumulates 4,200 paper summaries — but analysis reveals 800+ are near-duplicates of 300 unique papers due to duplicate ingestion.

PaperMind runs weekly deduplication with a threshold of 0.88. Each run reduces the store by 15–20%. Recall diversity improves dramatically: a query for "transformer attention mechanisms" previously returned 6 near-identical BERT paper summaries; after dedup, it returns 6 distinct papers covering different angles (BERT, attention variants, efficient transformers, vision transformers, etc.).

Step-by-Step Implementation

  1. Ingest memories normally — dedup runs separately
    Store memories as usual. Do not try to deduplicate before storing (that would require reading all existing memories on every write). Let duplicates accumulate, then clean them in batch runs. The performance cost of a write-time dedup check is prohibitive at scale.
  2. Schedule deduplication as a maintenance job
    Run dedup after large batch ingestion, at daily or weekly intervals, or when the store size crosses a threshold (e.g., every 1,000 new memories). For interactive agents, run it asynchronously — dedup does not block recall during execution.
  3. Choose the right similarity threshold
    The default 0.87 threshold works for general use. For research or technical domains where precise phrasing matters, raise it to 0.91–0.93. For conversational agents where paraphrasing is common, lower it to 0.83–0.85. Test with a sample of 100 memories before running on the full store.
  4. Inspect the merge report
    The deduplicate() response includes a count of merged pairs and the canonical content chosen for each. Review the report periodically to catch over-aggressive merging (e.g., two distinct but similar-sounding papers being merged incorrectly).
  5. Validate recall diversity post-dedup
    After each deduplication run, sample 5–10 recall queries and compare result diversity before and after. A healthy dedup run increases the number of distinct topics returned per query while keeping top_k constant.

Before & After: Memory Store State

Before dedup — 5 memories, 3 duplicates
[
  {
    "id": "m-001",
    "content": "User likes Python",
    "importance": 0.70
  },
  {
    "id": "m-002",
    "content": "User uses Python primarily",
    "importance": 0.80
  },
  {
    "id": "m-003",
    "content": "User prefers Python over JS",
    "importance": 0.75
  },
  {
    "id": "m-004",
    "content": "User enjoys hiking on weekends",
    "importance": 0.65
  },
  {
    "id": "m-005",
    "content": "User works at Acme Corp",
    "importance": 0.90
  }
]
// recall("programming language") returns
// m-001, m-002, m-003 — 3 redundant slots
// hiking and Acme Corp get pushed out
After dedup — 3 memories, canonical
[
  {
    "id": "m-can-001",
    "content": "User is a Python developer",
    "importance": 0.80,
    "merged_from": ["m-001", "m-002", "m-003"],
    "merge_count": 3
  },
  {
    "id": "m-004",
    "content": "User enjoys hiking on weekends",
    "importance": 0.65
  },
  {
    "id": "m-005",
    "content": "User works at Acme Corp",
    "importance": 0.90
  }
]
// recall("programming language") returns
// m-can-001 (1 slot) + diverse facts
// dedup report: { merged: 2, remaining: 3 }

Implementation

# Trigger semantic deduplication
curl -X POST http://localhost:3300/v1/agents/research-bot/deduplicate   -H "Authorization: Bearer dk-..."   -H "Content-Type: application/json"   -d '{}'

# Response includes dedup report
# {
#   "merged": 12,
#   "remaining": 45,
#   "elapsed_ms": 340,
#   "canonical_ids": ["m-can-001", "m-can-002", ...]
# }

# Verify recall diversity improved
curl "http://localhost:3300/v1/memory/recall?agent_id=research-bot&query=transformer+attention+mechanisms&top_k=6"   -H "Authorization: Bearer dk-..."
from dakera import DakeraClient
import time

client = DakeraClient(base_url="http://localhost:3300", api_key="dk-...")
AGENT = "research-bot"

# Ingest paper summaries (duplicates accumulate naturally)
papers = [
    ("BERT: Pre-training of Deep Bidirectional Transformers", 0.85),
    ("BERT language model pre-training for NLP", 0.82),  # near-dup of above
    ("Attention Is All You Need — transformer architecture", 0.90),
    ("The original transformer paper by Vaswani et al.", 0.88),  # near-dup of above
    ("GPT-3: Language Models are Few-Shot Learners", 0.87),
]

for content, importance in papers:
    client.store_memory(
        agent_id=AGENT,
        content=content,
        importance=importance,
        memory_type="semantic",
        tags=["paper", "nlp"]
    )

# Recall before dedup — duplicates consume slots
before = client.recall(agent_id=AGENT, query="transformer architecture NLP", top_k=5)
print(f"Before dedup: {len(before.memories)} results, likely 3-4 near-duplicates")

# Trigger deduplication
result = client.deduplicate(agent_id=AGENT)
print(f"Merged {result.merged} duplicates into {result.remaining} canonical memories")
print(f"Dedup elapsed: {result.elapsed_ms}ms")

# Recall after dedup — diverse results
after = client.recall(agent_id=AGENT, query="transformer architecture NLP", top_k=5)
print(f"After dedup: {len(after.memories)} results — diverse topics")
for mem in after.memories:
    print(f"  [{mem.importance:.2f}] {mem.content[:80]}")

# Schedule weekly dedup (example with a simple scheduler)
def run_weekly_dedup():
    while True:
        report = client.deduplicate(agent_id=AGENT)
        print(f"Weekly dedup: merged={report.merged}, remaining={report.remaining}")
        time.sleep(7 * 24 * 3600)  # 7 days
import { DakeraClient } from '@dakera-ai/dakera';

const client = new DakeraClient({ baseUrl: 'http://localhost:3300', apiKey: 'dk-...' });
const AGENT = 'research-bot';

// Ingest papers — near-duplicates accumulate from multiple sources
const papers = [
  { content: 'BERT: Pre-training of Deep Bidirectional Transformers', importance: 0.85 },
  { content: 'BERT language model pre-training for NLP tasks', importance: 0.82 }, // near-dup
  { content: 'Attention Is All You Need — transformer architecture paper', importance: 0.90 },
  { content: 'The original Vaswani et al. transformer paper 2017', importance: 0.88 }, // near-dup
  { content: 'GPT-3: Language Models are Few-Shot Learners', importance: 0.87 },
];

for (const paper of papers) {
  await client.storeMemory(AGENT, {
    content: paper.content,
    importance: paper.importance,
    memoryType: 'semantic',
    tags: ['paper', 'nlp']
  });
}

// Recall before — duplicates waste retrieval slots
const before = await client.recall(AGENT, 'transformer architecture NLP', { top_k: 5 });
console.log('Before:', before.memories.length, 'results (likely 3-4 near-duplicates)');

// Trigger semantic deduplication
const report = await client.deduplicate({ agentId: AGENT });
console.log(`Merged ${report.merged} duplicates, ${report.remaining} canonical memories remain`);
console.log(`Elapsed: ${report.elapsed_ms}ms`);

// Recall after — diverse, distinct papers
const after = await client.recall(AGENT, 'transformer architecture NLP', { top_k: 5 });
console.log('After:', after.memories.length, 'results — diverse topics per slot');
after.memories.forEach(m => console.log(`  [${m.importance}] ${m.content.slice(0, 80)}`));

// Schedule with setInterval (production: use cron or Dakera autopilot)
setInterval(async () => {
  const r = await client.deduplicate({ agentId: AGENT });
  console.log(`Scheduled dedup: merged=${r.merged}, remaining=${r.remaining}`);
}, 7 * 24 * 60 * 60 * 1000); // weekly
use dakera_rs::{Client, StoreMemoryRequest, RecallRequest};

let client = Client::new("http://localhost:3300", "dk-...");
let agent = "research-bot";

// Store paper summaries (some will be near-duplicates)
let papers = vec![
    ("BERT: Pre-training of Deep Bidirectional Transformers", 0.85f32),
    ("BERT language model for NLP pre-training", 0.82),  // near-dup
    ("Attention Is All You Need — transformer paper", 0.90),
];

for (content, importance) in papers {
    client.store_memory(agent, StoreMemoryRequest {
        content: content.into(),
        importance: Some(importance),
        memory_type: "semantic".into(),
        tags: vec!["paper".into(), "nlp".into()],
        ..Default::default()
    }).await?;
}

// Trigger deduplication via REST
// POST /v1/agents/research-bot/deduplicate
// Returns: { "merged": N, "remaining": M, "elapsed_ms": X }

// Verify recall diversity improved
let results = client.recall(agent, RecallRequest {
    query: "transformer architecture NLP".into(),
    top_k: Some(6),
    ..Default::default()
}).await?;

println!("After dedup: {} diverse results", results.memories.len());
package main

import (
    "context"
    "fmt"
    dakera "github.com/dakera-ai/dakera-go"
)

func main() {
    client := dakera.NewClient("http://localhost:3300", "dk-...")
    ctx := context.Background()
    agent := "research-bot"

    // Store papers with potential duplicates
    papers := []struct {
        Content    string
        Importance float64
    }{
        {"BERT: Pre-training of Deep Bidirectional Transformers", 0.85},
        {"BERT language model pre-training for NLP", 0.82}, // near-dup
        {"Attention Is All You Need — transformer architecture", 0.90},
    }

    for _, p := range papers {
        client.StoreMemory(ctx, agent, dakera.StoreMemoryRequest{
            Content:    p.Content,
            Importance: p.Importance,
            MemoryType: "semantic",
            Tags:       []string{"paper", "nlp"},
        })
    }

    // Trigger deduplication
    report, _ := client.Deduplicate(ctx, dakera.DeduplicateRequest{AgentID: agent})
    fmt.Printf("Dedup: merged=%d, remaining=%d, elapsed=%dms
",
        report.Merged, report.Remaining, report.ElapsedMs)

    // Verify diverse recall
    results, _ := client.Recall(ctx, agent, dakera.RecallRequest{
        Query: "transformer architecture NLP",
        TopK:  6,
    })
    fmt.Printf("After dedup: %d diverse results
", len(results.Memories))
}

Cut duplicate memories by up to 30% in 340ms

Dakera's dedup engine runs server-side and is non-blocking to recall.

Deploy Free →

SDK Reference

MethodSDKPurpose
store_memory(agent_id, content, importance, memory_type, tags)PythonStore memory (dedup runs separately)
storeMemory(agentId, {content, importance, memoryType, tags})TypeScriptStore memory (dedup runs separately)
recall(agent_id, query, top_k)PythonRecall with improved diversity post-dedup
recall(agentId, query, {top_k})TypeScriptRecall with improved diversity post-dedup
POST /v1/agents/{id}/deduplicateRESTTrigger semantic deduplication for agent
search_memories(agent_id, query)PythonSearch memories before dedup to preview duplicates
searchMemories(agentId, query, {top_k})TypeScriptSearch memories before dedup to preview duplicates
forget(agent_id, memory_id)PythonManually remove a specific duplicate

Performance Considerations

340ms
Dedup runtime for 500 memories (p50)
30%
Typical duplicate rate in multi-source ingestion pipelines
2.1s
Dedup runtime for 5,000 memories (O(N log N) clustering)
  • Dedup is O(N log N), not O(N²). Dakera uses approximate nearest neighbor clustering (HNSW) to find candidates rather than exhaustive pairwise comparison. A store of 10,000 memories dedups in ~8 seconds — not minutes.
  • Run dedup off-peak. Although dedup doesn't block recall, it does spike CPU usage on the Dakera server during embedding comparison. Schedule it at low-traffic times or after batch ingestion completes.
  • Higher thresholds = faster dedup. A threshold of 0.95 only compares the most similar pairs, completing 40% faster than 0.85. Use the higher threshold if you only want to catch near-exact duplicates.

Edge Cases

Edge Case 1: Merging Semantically Similar but Factually Distinct Memories

"User worked at Google in 2018" and "User works at Google" may score 0.91 similarity but contain different facts (past vs. present employment). Domain-specific temporal context can be lost in merges. For fact-sensitive domains, raise the threshold to 0.93+ and review merge reports manually before committing for the first few runs.

Edge Case 2: Over-aggressive Dedup on Low-Memory Stores

With fewer than 50 memories, false-positive dedup (merging actually-distinct memories) becomes statistically significant. Only run dedup when the store has 100+ memories. Below that, the performance cost and risk of false merges outweigh the benefit.

Edge Case 3: Dedup Losing Important Tag Diversity

Three memories tagged ["python", "backend"], ["python", "data-science"], and ["python", "ml"] may merge into a canonical with only the first memory's tags. Configure your pipeline to merge tag sets (union) rather than picking one, so the canonical memory is retrievable via all relevant tags.

Edge Case 4: Ingesting the Same Paper from Different Sources

Research pipelines often ingest from arxiv, Semantic Scholar, and direct uploads. The same paper appears with slightly different abstracts per source. Before ingesting, compute a semantic hash (embed the title only and check similarity against existing titles) to detect duplicates at write time for high-frequency ingestion, falling back to periodic dedup for lower-frequency cases.

Edge Case 5: Coordinating Dedup in Multi-Agent Setups

If multiple agents share a namespace, concurrent dedup calls can create race conditions where the same canonical memory gets created twice. Use an external lock (Redis, database mutex) to serialize dedup runs per namespace, or use Dakera's autopilot scheduling to ensure only one dedup job runs at a time.

Advanced Configuration: Threshold Tuning & Scheduling

Threshold Reference by Domain

DomainRecommended ThresholdRationale
Conversational / personal assistant0.83–0.86Paraphrasing is common; be aggressive
General knowledge / Q&A0.87–0.89Default; balanced precision and recall
Research papers / technical facts0.91–0.93Precise phrasing matters; be conservative
Legal / compliance documents0.95+Near-exact only; review manually

Scheduling Recommendation

For high-volume agents (>1,000 memories/day), run dedup after every batch ingestion. For low-volume agents (<100 memories/day), weekly dedup is sufficient. For interactive agents, run dedup asynchronously after every 500 new memories.

One Concept, One Memory

Dakera's deduplication engine collapses near-duplicate memories in seconds — restoring recall diversity and cutting storage costs automatically.

Get Started Free →