Semantic Deduplication
Automatically detect near-duplicate memories using embedding similarity and merge them into single canonical entries. Eliminate recall noise from repeated ingestion, multi-source pipelines, and long-running agents that revisit the same facts.
Start Free →- Running Dakera server (Quickstart)
- An agent ID to scope memories to
- Understanding of cosine similarity thresholds (0.85+ = near-duplicate)
The Problem: Redundant Memories Degrade Recall Quality
A research assistant agent processes 50 papers per day. Multiple papers state the same findings in different words. A user preference mentioned in Monday's chat appears again in Thursday's follow-up. A multi-agent pipeline where three agents independently discover and store the same fact creates three near-identical memories.
The result: a recall query for "user's Python preference" returns five nearly identical memories, each taking up a retrieval slot. Diversity of results collapses. The agent surfaces the same information five times instead of five distinct, relevant facts. Deduplication restores recall diversity by ensuring one concept maps to one memory.
Dakera performs semantic deduplication — it detects near-duplicates based on embedding cosine similarity, not string matching. "User prefers Python" and "User codes primarily in Python" are different strings but semantically identical. The default similarity threshold is 0.87 (configurable). Memories below the threshold are considered distinct and kept separately.
Architecture
Deduplication works in two stages. First, Dakera computes pairwise cosine similarity between memory embeddings for the agent. Pairs above the similarity threshold (default: 0.87) are identified as duplicate candidates. Second, candidates are merged into a single canonical memory — retaining the highest importance score, the most recent timestamp, and the combined semantic signal from the merged embeddings.
- Call
deduplicate()periodically (after batch ingestion or on a schedule) - Merged canonical memory retains the highest importance from all candidates
- Original duplicate memories are removed from the store
- Recall diversity improves immediately — one concept, one slot
- Storage costs decrease proportionally to deduplication rate
Diagram: Similarity Matrix Heatmap
Diagram: Dedup Merge Flow
Real-World Scenario: Research Assistant Deduplicating Paper Summaries
Scenario: PaperMind AI builds a research assistant that ingests academic paper abstracts and stores one summary memory per paper. Researchers submit the same landmark papers from multiple sources (arxiv, Semantic Scholar, direct PDF upload). Over 3 months, the agent accumulates 4,200 paper summaries — but analysis reveals 800+ are near-duplicates of 300 unique papers due to duplicate ingestion.
PaperMind runs weekly deduplication with a threshold of 0.88. Each run reduces the store by 15–20%. Recall diversity improves dramatically: a query for "transformer attention mechanisms" previously returned 6 near-identical BERT paper summaries; after dedup, it returns 6 distinct papers covering different angles (BERT, attention variants, efficient transformers, vision transformers, etc.).
Step-by-Step Implementation
-
Ingest memories normally — dedup runs separatelyStore memories as usual. Do not try to deduplicate before storing (that would require reading all existing memories on every write). Let duplicates accumulate, then clean them in batch runs. The performance cost of a write-time dedup check is prohibitive at scale.
-
Schedule deduplication as a maintenance jobRun dedup after large batch ingestion, at daily or weekly intervals, or when the store size crosses a threshold (e.g., every 1,000 new memories). For interactive agents, run it asynchronously — dedup does not block recall during execution.
-
Choose the right similarity thresholdThe default 0.87 threshold works for general use. For research or technical domains where precise phrasing matters, raise it to 0.91–0.93. For conversational agents where paraphrasing is common, lower it to 0.83–0.85. Test with a sample of 100 memories before running on the full store.
-
Inspect the merge reportThe
deduplicate()response includes a count of merged pairs and the canonical content chosen for each. Review the report periodically to catch over-aggressive merging (e.g., two distinct but similar-sounding papers being merged incorrectly). -
Validate recall diversity post-dedupAfter each deduplication run, sample 5–10 recall queries and compare result diversity before and after. A healthy dedup run increases the number of distinct topics returned per query while keeping top_k constant.
Before & After: Memory Store State
[
{
"id": "m-001",
"content": "User likes Python",
"importance": 0.70
},
{
"id": "m-002",
"content": "User uses Python primarily",
"importance": 0.80
},
{
"id": "m-003",
"content": "User prefers Python over JS",
"importance": 0.75
},
{
"id": "m-004",
"content": "User enjoys hiking on weekends",
"importance": 0.65
},
{
"id": "m-005",
"content": "User works at Acme Corp",
"importance": 0.90
}
]
// recall("programming language") returns
// m-001, m-002, m-003 — 3 redundant slots
// hiking and Acme Corp get pushed out
[
{
"id": "m-can-001",
"content": "User is a Python developer",
"importance": 0.80,
"merged_from": ["m-001", "m-002", "m-003"],
"merge_count": 3
},
{
"id": "m-004",
"content": "User enjoys hiking on weekends",
"importance": 0.65
},
{
"id": "m-005",
"content": "User works at Acme Corp",
"importance": 0.90
}
]
// recall("programming language") returns
// m-can-001 (1 slot) + diverse facts
// dedup report: { merged: 2, remaining: 3 }
Implementation
# Trigger semantic deduplication
curl -X POST http://localhost:3300/v1/agents/research-bot/deduplicate -H "Authorization: Bearer dk-..." -H "Content-Type: application/json" -d '{}'
# Response includes dedup report
# {
# "merged": 12,
# "remaining": 45,
# "elapsed_ms": 340,
# "canonical_ids": ["m-can-001", "m-can-002", ...]
# }
# Verify recall diversity improved
curl "http://localhost:3300/v1/memory/recall?agent_id=research-bot&query=transformer+attention+mechanisms&top_k=6" -H "Authorization: Bearer dk-..."from dakera import DakeraClient
import time
client = DakeraClient(base_url="http://localhost:3300", api_key="dk-...")
AGENT = "research-bot"
# Ingest paper summaries (duplicates accumulate naturally)
papers = [
("BERT: Pre-training of Deep Bidirectional Transformers", 0.85),
("BERT language model pre-training for NLP", 0.82), # near-dup of above
("Attention Is All You Need — transformer architecture", 0.90),
("The original transformer paper by Vaswani et al.", 0.88), # near-dup of above
("GPT-3: Language Models are Few-Shot Learners", 0.87),
]
for content, importance in papers:
client.store_memory(
agent_id=AGENT,
content=content,
importance=importance,
memory_type="semantic",
tags=["paper", "nlp"]
)
# Recall before dedup — duplicates consume slots
before = client.recall(agent_id=AGENT, query="transformer architecture NLP", top_k=5)
print(f"Before dedup: {len(before.memories)} results, likely 3-4 near-duplicates")
# Trigger deduplication
result = client.deduplicate(agent_id=AGENT)
print(f"Merged {result.merged} duplicates into {result.remaining} canonical memories")
print(f"Dedup elapsed: {result.elapsed_ms}ms")
# Recall after dedup — diverse results
after = client.recall(agent_id=AGENT, query="transformer architecture NLP", top_k=5)
print(f"After dedup: {len(after.memories)} results — diverse topics")
for mem in after.memories:
print(f" [{mem.importance:.2f}] {mem.content[:80]}")
# Schedule weekly dedup (example with a simple scheduler)
def run_weekly_dedup():
while True:
report = client.deduplicate(agent_id=AGENT)
print(f"Weekly dedup: merged={report.merged}, remaining={report.remaining}")
time.sleep(7 * 24 * 3600) # 7 daysimport { DakeraClient } from '@dakera-ai/dakera';
const client = new DakeraClient({ baseUrl: 'http://localhost:3300', apiKey: 'dk-...' });
const AGENT = 'research-bot';
// Ingest papers — near-duplicates accumulate from multiple sources
const papers = [
{ content: 'BERT: Pre-training of Deep Bidirectional Transformers', importance: 0.85 },
{ content: 'BERT language model pre-training for NLP tasks', importance: 0.82 }, // near-dup
{ content: 'Attention Is All You Need — transformer architecture paper', importance: 0.90 },
{ content: 'The original Vaswani et al. transformer paper 2017', importance: 0.88 }, // near-dup
{ content: 'GPT-3: Language Models are Few-Shot Learners', importance: 0.87 },
];
for (const paper of papers) {
await client.storeMemory(AGENT, {
content: paper.content,
importance: paper.importance,
memoryType: 'semantic',
tags: ['paper', 'nlp']
});
}
// Recall before — duplicates waste retrieval slots
const before = await client.recall(AGENT, 'transformer architecture NLP', { top_k: 5 });
console.log('Before:', before.memories.length, 'results (likely 3-4 near-duplicates)');
// Trigger semantic deduplication
const report = await client.deduplicate({ agentId: AGENT });
console.log(`Merged ${report.merged} duplicates, ${report.remaining} canonical memories remain`);
console.log(`Elapsed: ${report.elapsed_ms}ms`);
// Recall after — diverse, distinct papers
const after = await client.recall(AGENT, 'transformer architecture NLP', { top_k: 5 });
console.log('After:', after.memories.length, 'results — diverse topics per slot');
after.memories.forEach(m => console.log(` [${m.importance}] ${m.content.slice(0, 80)}`));
// Schedule with setInterval (production: use cron or Dakera autopilot)
setInterval(async () => {
const r = await client.deduplicate({ agentId: AGENT });
console.log(`Scheduled dedup: merged=${r.merged}, remaining=${r.remaining}`);
}, 7 * 24 * 60 * 60 * 1000); // weeklyuse dakera_rs::{Client, StoreMemoryRequest, RecallRequest};
let client = Client::new("http://localhost:3300", "dk-...");
let agent = "research-bot";
// Store paper summaries (some will be near-duplicates)
let papers = vec![
("BERT: Pre-training of Deep Bidirectional Transformers", 0.85f32),
("BERT language model for NLP pre-training", 0.82), // near-dup
("Attention Is All You Need — transformer paper", 0.90),
];
for (content, importance) in papers {
client.store_memory(agent, StoreMemoryRequest {
content: content.into(),
importance: Some(importance),
memory_type: "semantic".into(),
tags: vec!["paper".into(), "nlp".into()],
..Default::default()
}).await?;
}
// Trigger deduplication via REST
// POST /v1/agents/research-bot/deduplicate
// Returns: { "merged": N, "remaining": M, "elapsed_ms": X }
// Verify recall diversity improved
let results = client.recall(agent, RecallRequest {
query: "transformer architecture NLP".into(),
top_k: Some(6),
..Default::default()
}).await?;
println!("After dedup: {} diverse results", results.memories.len());package main
import (
"context"
"fmt"
dakera "github.com/dakera-ai/dakera-go"
)
func main() {
client := dakera.NewClient("http://localhost:3300", "dk-...")
ctx := context.Background()
agent := "research-bot"
// Store papers with potential duplicates
papers := []struct {
Content string
Importance float64
}{
{"BERT: Pre-training of Deep Bidirectional Transformers", 0.85},
{"BERT language model pre-training for NLP", 0.82}, // near-dup
{"Attention Is All You Need — transformer architecture", 0.90},
}
for _, p := range papers {
client.StoreMemory(ctx, agent, dakera.StoreMemoryRequest{
Content: p.Content,
Importance: p.Importance,
MemoryType: "semantic",
Tags: []string{"paper", "nlp"},
})
}
// Trigger deduplication
report, _ := client.Deduplicate(ctx, dakera.DeduplicateRequest{AgentID: agent})
fmt.Printf("Dedup: merged=%d, remaining=%d, elapsed=%dms
",
report.Merged, report.Remaining, report.ElapsedMs)
// Verify diverse recall
results, _ := client.Recall(ctx, agent, dakera.RecallRequest{
Query: "transformer architecture NLP",
TopK: 6,
})
fmt.Printf("After dedup: %d diverse results
", len(results.Memories))
}Cut duplicate memories by up to 30% in 340ms
Dakera's dedup engine runs server-side and is non-blocking to recall.
SDK Reference
| Method | SDK | Purpose |
|---|---|---|
store_memory(agent_id, content, importance, memory_type, tags) | Python | Store memory (dedup runs separately) |
storeMemory(agentId, {content, importance, memoryType, tags}) | TypeScript | Store memory (dedup runs separately) |
recall(agent_id, query, top_k) | Python | Recall with improved diversity post-dedup |
recall(agentId, query, {top_k}) | TypeScript | Recall with improved diversity post-dedup |
POST /v1/agents/{id}/deduplicate | REST | Trigger semantic deduplication for agent |
search_memories(agent_id, query) | Python | Search memories before dedup to preview duplicates |
searchMemories(agentId, query, {top_k}) | TypeScript | Search memories before dedup to preview duplicates |
forget(agent_id, memory_id) | Python | Manually remove a specific duplicate |
Performance Considerations
- Dedup is O(N log N), not O(N²). Dakera uses approximate nearest neighbor clustering (HNSW) to find candidates rather than exhaustive pairwise comparison. A store of 10,000 memories dedups in ~8 seconds — not minutes.
- Run dedup off-peak. Although dedup doesn't block recall, it does spike CPU usage on the Dakera server during embedding comparison. Schedule it at low-traffic times or after batch ingestion completes.
- Higher thresholds = faster dedup. A threshold of 0.95 only compares the most similar pairs, completing 40% faster than 0.85. Use the higher threshold if you only want to catch near-exact duplicates.
Edge Cases
"User worked at Google in 2018" and "User works at Google" may score 0.91 similarity but contain different facts (past vs. present employment). Domain-specific temporal context can be lost in merges. For fact-sensitive domains, raise the threshold to 0.93+ and review merge reports manually before committing for the first few runs.
With fewer than 50 memories, false-positive dedup (merging actually-distinct memories) becomes statistically significant. Only run dedup when the store has 100+ memories. Below that, the performance cost and risk of false merges outweigh the benefit.
Three memories tagged ["python", "backend"], ["python", "data-science"], and ["python", "ml"] may merge into a canonical with only the first memory's tags. Configure your pipeline to merge tag sets (union) rather than picking one, so the canonical memory is retrievable via all relevant tags.
Research pipelines often ingest from arxiv, Semantic Scholar, and direct uploads. The same paper appears with slightly different abstracts per source. Before ingesting, compute a semantic hash (embed the title only and check similarity against existing titles) to detect duplicates at write time for high-frequency ingestion, falling back to periodic dedup for lower-frequency cases.
If multiple agents share a namespace, concurrent dedup calls can create race conditions where the same canonical memory gets created twice. Use an external lock (Redis, database mutex) to serialize dedup runs per namespace, or use Dakera's autopilot scheduling to ensure only one dedup job runs at a time.
Advanced Configuration: Threshold Tuning & Scheduling
Threshold Reference by Domain
| Domain | Recommended Threshold | Rationale |
|---|---|---|
| Conversational / personal assistant | 0.83–0.86 | Paraphrasing is common; be aggressive |
| General knowledge / Q&A | 0.87–0.89 | Default; balanced precision and recall |
| Research papers / technical facts | 0.91–0.93 | Precise phrasing matters; be conservative |
| Legal / compliance documents | 0.95+ | Near-exact only; review manually |
Scheduling Recommendation
For high-volume agents (>1,000 memories/day), run dedup after every batch ingestion. For low-volume agents (<100 memories/day), weekly dedup is sufficient. For interactive agents, run dedup asynchronously after every 500 new memories.
One Concept, One Memory
Dakera's deduplication engine collapses near-duplicate memories in seconds — restoring recall diversity and cutting storage costs automatically.
Get Started Free →