Comparison May 13, 2026 12 min read

Best AI Agent Memory Frameworks in 2026: Compared and Ranked

AI agents have a memory problem. Without persistence, every conversation starts from scratch — your agent doesn't know that yesterday's user prefers Python over JavaScript, or that the task it completed last week needs a follow-up. Long context windows help, but they're expensive, slow, and still limited to a single session.

The solution is an external memory layer — a dedicated store that persists what agents learn and surfaces the most relevant information at query time. In 2026, a handful of frameworks compete for this role. They differ significantly in architecture, deployment model, retrieval quality, and operational complexity.

This guide covers the five most widely used options: Dakera, Mem0, Letta, Zep, and Hindsight. We'll compare them on the criteria that matter to production teams: benchmark scores, deployment model, retrieval architecture, and total cost of ownership.

Disclosure

This article is written by the Dakera team. We've tried to be accurate about competitors, but you should verify claims independently. Benchmark scores are from public sources cited inline.

Quick Summary: The Comparison Table

Framework	LoCoMo Score	Self-Hosted	MCP Support	Rust Binary	Decay Engine	License
Dakera	87.6%	✓ Native	✓ 83 tools	✓ 44 MB	✓ Half-life	Open core
Mem0	91.6%*	⚠ Cloud + OSS	⚠ Partial	✗ Python	✗	Apache-2.0
Letta	—	⚠ Self-host possible	✗	✗ Python	✗	Apache-2.0
Zep	—	⚠ Cloud + OSS	✗	✗ Go	⚠ Basic	Apache-2.0 (CE)
Hindsight	—	✗ Cloud only	✗	✗	✗	Proprietary

* Mem0's 91.6% is from their own benchmark run using a different prompt format and with LLM post-processing enabled. Dakera's 87.6% is evaluated without LLM post-processing. Direct comparison requires methodology alignment.

The Frameworks

1. Dakera — Self-Hosted, Rust-Native, MCP-First

Dakera is a single Rust binary that gives AI agents persistent memory via hybrid BM25+HNSW retrieval, a knowledge graph, session management, and configurable importance decay. It runs entirely on your infrastructure and ships a native MCP server with 83 tools for immediate integration with Claude, Cursor, and Windsurf.

87.6%

LoCoMo 1,540Q Overall

73.9%

Cat3 Temporal Inference

44 MB

Binary Size

MCP Tools

Unique moat: Access-weighted importance decay with configurable half-life per namespace. Memories decay like human memory — frequently accessed facts stay sharp, stale context fades. No other framework in this comparison offers this out of the box.

Best for: Teams that need GDPR/HIPAA compliance, air-gapped deployment, or MCP integration with Claude Code. The Rust binary eliminates the Python/Node runtime tax and reduces attack surface to near-zero.

Limitations: Newer project with smaller community than Mem0. Cloud-hosted option not yet available for teams that prefer managed services.

# Self-host Dakera in under 5 minutes
docker run -d \
  --name dakera \
  -p 3300:3300 \
  -e DAKERA_ROOT_API_KEY=my-key \
  ghcr.io/dakera-ai/dakera:latest

curl http://localhost:3300/health
# {"status":"healthy","version":"0.11.55"}

2. Mem0 — The Established Leader, Cloud-First

Mem0 is the most widely adopted AI memory framework. Its open-source version (m0) provides Python-based memory management with OpenAI embeddings and a simple store/retrieve API. The commercial hosted version adds team features, dashboards, and a higher benchmark score through LLM post-processing.

Mem0's published 91.6% LoCoMo score makes it the current benchmark leader, though this includes LLM-assisted answer extraction that Dakera excludes by design. For raw recall without post-processing, both systems are competitive.

Best for: Python-first teams that want a quick integration and don't have strict data residency requirements. Mem0's pip install and OpenAI-native workflow is the fastest path to a working prototype.

Limitations: Cloud-first architecture means your memory data goes through Mem0's servers. No MCP server for direct Claude/Cursor integration. Python runtime adds significant overhead vs Rust. No decay engine — all memories are equally weighted indefinitely.

3. Letta (formerly MemGPT) — Agent OS with Built-In Memory

Letta is an agent framework with memory as a first-class primitive. Rather than a standalone memory store, Letta treats memory as part of the agent's architecture — the agent itself manages what to remember and when. This gives Letta unique in-context memory capabilities but at the cost of tighter coupling.

Best for: Teams building with LangChain or LlamaIndex who want memory that's tightly integrated with their agent's reasoning loop. Letta shines when the agent needs to actively decide what to remember.

Limitations: Not a standalone memory server — you're adopting the Letta agent framework. No MCP support. No meaningful benchmark score on LoCoMo. Python-only runtime.

4. Zep — Temporal Graph for Conversation Memory

Zep specializes in conversation memory and user-fact extraction. It builds a temporal knowledge graph from conversations, making it good at tracking how user preferences and facts change over time. The community edition is Go-based and self-hostable; the commercial edition adds a managed cloud.

Best for: Customer-facing agents that need to remember what users said across many conversations. Zep's conversation-centric model works well for chat applications.

Limitations: Architecture is heavily conversation-oriented — not designed for multi-agent or cross-agent memory sharing. No MCP server. Limited benchmarking transparency.

5. Hindsight — Managed Cloud Only

Hindsight is a commercial memory layer for enterprise AI agents. It offers a managed API with no self-hosting option. For teams with strict data governance requirements, the cloud-only model is a dealbreaker. No public benchmark results are available.

Best for: Enterprise teams with existing cloud vendor relationships who want a fully managed service and don't have data residency constraints.

Limitations: No self-hosting. No public benchmark scores. Proprietary and expensive at scale. No MCP support.

Decision Tree: Which Framework Should You Use?

Choose based on your constraints

You need GDPR / HIPAA compliance, air-gapped deployment, or data must not leave your servers → Dakera (only option with true self-hosting and zero cloud dependency)

You're using Claude, Cursor, or Windsurf and want persistent memory without code changes → Dakera (only option with a native MCP server and 83 MCP tools)

You want the fastest Python prototype and don't care about data residency → Mem0 (best ecosystem, fastest setup for Python)

You're building a multi-agent system where agents share memory → Dakera (namespace isolation + cross-agent recall built-in)

Memory importance should decay over time (stale facts shouldn't crowd out recent ones) → Dakera (only option with access-weighted half-life decay)

You're building a customer-facing chatbot and want conversation-centric memory → Zep (temporal graph optimized for conversations)

You're using LangChain or LlamaIndex and want tight agent-framework integration → Letta (native framework memory)

Retrieval Architecture Deep Dive

The most important dimension most comparisons skip is how each framework retrieves memories at query time. This directly determines recall quality on real workloads.

Hybrid BM25 + HNSW Vector Search (Dakera)

Dakera runs both keyword (BM25) and semantic (HNSW vector) search in a single round-trip and fuses the results. This matters because neither approach alone is sufficient:

Pure vector search misses exact keyword matches (names, error codes, version numbers)
Pure BM25 misses semantic equivalents ("user is unhappy" vs "user expressed frustration")

The hybrid approach explains why Dakera scores 73.9% on LoCoMo's Cat3 (temporal inference) — the hardest category — while remaining fast enough for production use.

Vector + LLM Extraction (Mem0)

Mem0 stores user facts via LLM extraction at write time. This compresses memory into facts ("user prefers dark mode") rather than raw episodic content. At recall time, it combines vector similarity with the extracted fact store. This works well for explicit preferences but loses nuance in complex multi-turn interactions.

Conversation Graph (Zep)

Zep builds a temporal knowledge graph from conversations, tracking when facts change over time. This is highly effective for chatbot scenarios where you need to know "what did the user say about X in their most recent session" but less flexible for multi-agent episodic memory.

Importance Decay: The Differentiator Nobody Talks About

Every memory system faces the same fundamental problem: as memories accumulate, the signal-to-noise ratio degrades. A memory from 6 months ago about a user's project requirements is less relevant than what they said yesterday — but both have the same vector embedding, so pure semantic search can't distinguish them.

Dakera's access-weighted half-life decay solves this:

Each memory has a base importance score (0.0–1.0) set at write time
Memories decay exponentially based on time since last access, with a configurable half-life per namespace
Accessing a memory resets its decay clock — frequently used facts stay sharp
Decay rate is per-namespace, so you can set different half-lives for user preferences (slow decay) vs working context (fast decay)

No other framework in this comparison offers this architecture. Mem0, Letta, and Zep treat all stored memories as equally weighted regardless of age or access frequency.

MCP Integration: The 2026 Developer Interface

The Model Context Protocol has become the standard integration layer for AI tools in 2026. Claude Desktop, Claude Code, Cursor, and Windsurf all support MCP natively. An MCP server turns Dakera's REST API into a set of tools your AI assistant can call directly — no code changes required.

Only Dakera ships a native MCP server among the frameworks compared here. Mem0, Letta, Zep, and Hindsight require custom code to integrate.

With Dakera's MCP server, your Claude Code session gets:

dakera_store — save what Claude learns about your codebase
dakera_recall — retrieve relevant context before responding
dakera_session_start/end — group memories by work session
dakera_hybrid_search — semantic + keyword search in one call
79 additional tools for graph, vector, namespace, and decay operations

Deployment and Operations

For production teams, operational simplicity matters as much as feature completeness. Here's how the frameworks compare on the dimensions that drive operational cost:

Framework	Runtime	External Deps	Docker Image Size	Kubernetes
Dakera	Rust binary	MinIO (or S3)	~100 MB	✓ Helm chart
Mem0 (OSS)	Python 3.10+	Qdrant + Redis + Postgres	~1.2 GB	⚠ Manual
Letta	Python 3.10+	Postgres + vector DB	~900 MB	⚠ Manual
Zep (CE)	Go	Postgres + Neo4j	~300 MB	⚠ Manual

Dakera's single-binary architecture with S3-compatible object storage means you're running one process with one external dependency. Mem0's self-hosted stack requires Qdrant, Redis, and Postgres — three separate services to operate, monitor, and scale.

Benchmark Methodology Note

LoCoMo (Long-Context Memory) is the standard benchmark for agent memory evaluation. It tests 1,540 questions across four categories:

Cat1 — Single-hop factual recall (Dakera: 86.9%)
Cat2 — Multi-hop reasoning across memories (Dakera: 85.4%)
Cat3 — Temporal inference ("what changed between X and Y?") (Dakera: 73.9%)
Cat4 — Counting and aggregation (Dakera: 91.0%)

Mem0's published 91.6% overall uses LLM post-processing to clean answers before evaluation. Dakera's 87.6% is evaluated with exact string matching only. Benchmark comparisons between systems must account for methodology differences.

Dakera's benchmark is fully reproducible. The benchmark harness and dataset (LoCoMo standard) are available on GitHub. See full methodology →

The Bottom Line

There is no universally best AI agent memory framework — the right choice depends on your constraints:

For compliance, self-hosting, and MCP integration: Dakera is the only option that checks all three boxes
For Python prototyping with managed hosting: Mem0 is faster to get started
For LangChain/LlamaIndex ecosystem: Letta integrates natively
For conversation-centric chatbot memory: Zep's temporal graph is well-suited

If you're building agents that will run in production — especially agents that handle sensitive user data, need to run offline, or use Claude/Cursor/Windsurf — Dakera's architecture is built for that from the ground up.

Try Dakera

Dakera is free to self-host. Get started in under 5 minutes with Docker or Docker Compose. Read the quickstart guide →