Skip to content
Dakera AI DAKERA
Home Docs Blog Integrations Benchmark GitHub Get Started
Home Docs Blog Integrations Benchmark
GitHub ↗ Try Free →
Back to blog
Launch May 6, 2026

Introducing Dakera: Production Memory Infrastructure for AI Agents

Today we're launching Dakera in public alpha — a single Rust binary that gives your AI agents persistent, semantically-searchable memory with no external services required.

D
Dakera AI Team 6 min read

For the past year, every production AI agent team we've talked to has hit the same wall: their agents are stateless. Every session starts from zero. The agent that helped a user debug their deployment on Tuesday has no idea that conversation ever happened by Wednesday. Interactions accumulate, context evaporates.

The workarounds are painful. Some teams bolt on Postgres with pgvector. Others stand up Pinecone and manage another API key. A few try Redis. Most end up with a fragile four-service stack that costs more to operate than the agent it supports.

We built Dakera to fix this.

<10ms
p99 query latency
83
built-in MCP tools
1
binary, zero deps
v0.11
current release

What Dakera is

Dakera is an agent memory server. You run one binary — or pull one Docker image — and your agents get:

  • Persistent memory storage — semantically indexed, decay-weighted, cross-session
  • Hybrid retrieval — HNSW vector search combined with BM25 full-text for relevant recall
  • Knowledge graphs — entity-linked memory for multi-hop reasoning across sessions
  • Built-in embeddings — MiniLM, BGE, and E5 models run on-device via the Candle runtime, no external API calls
  • Session management — logical groupings that your agents can open, append to, and close
  • 83 MCP tools — plug directly into Claude Desktop, Cursor, Windsurf, or any MCP-compatible environment

The whole thing ships as a single Rust binary. No Python runtime, no external database, no embedding API, no graph service. You point it at a data directory, give it an API key, and it starts serving in under a second.

# Pull and run in one step
docker run -d \
  -p 3300:3300 \
  -e DAKERA_API_KEY=your-key \
  -v ./data:/data \
  ghcr.io/dakera-ai/dakera:v0.11.55

# Store a memory
curl -X POST http://localhost:3300/v1/memories \
  -H "Authorization: Bearer your-key" \
  -d '{"agent_id":"my-agent","content":"User prefers Python over TypeScript","importance":0.85}'

# Recall by meaning
curl -X POST http://localhost:3300/v1/memories/recall \
  -H "Authorization: Bearer your-key" \
  -d '{"agent_id":"my-agent","query":"language preferences","top_k":5}'

Why it's built in Rust

Agent memory is on the critical path for every agent invocation. It runs synchronously before your model call: recall relevant context, inject it into the prompt, then generate. If memory retrieval adds 500ms, your users feel it every time.

We wrote Dakera in Rust because latency matters. The p99 query latency — including embedding inference, HNSW traversal, BM25 scoring, importance reranking, and response serialization — is under 10ms on commodity hardware. That's fast enough to be invisible.

Rust also means a single statically-linked binary. There's no Python interpreter to manage, no garbage collector to tune, no JVM warmup. The binary starts, the ONNX model loads, and you're serving in under a second cold.

The memory model

Dakera's memory model was designed around how agents actually operate in production, not around how vector databases work in theory.

Every memory has an importance score between 0 and 1. Importance decays over time using a configurable half-life — but it's restored on access. A memory that's recalled frequently stays important. A memory that's never recalled after three weeks quietly fades. The result is a system that stays sharp without manual curation: frequently-needed context surfaces reliably, and old noise stops polluting recall.

Sessions give you logical groupings. An agent can open a session for a conversation, tag memories to that session, and recall in session-scoped or cross-session mode. For long-running agents operating across hundreds of sessions, cross-agent knowledge graphs let memories link to named entities — so "what did we discuss about Alice last month?" works across session boundaries.

MCP: memory for Claude, Cursor, and Windsurf

Dakera ships with a native MCP server that exposes 83 tools. Add three lines to your Claude Desktop config and your AI assistant gets persistent memory across every conversation — without writing any code:

// ~/.config/claude/claude_desktop_config.json
{
  "mcpServers": {
    "dakera": {
      "command": "dakera-mcp",
      "env": {
        "DAKERA_URL": "http://localhost:3300",
        "DAKERA_API_KEY": "your-key"
      }
    }
  }
}

Claude will automatically store important context, recall relevant memories before responding, and maintain a persistent knowledge graph about your projects, preferences, and work — across every session, forever.

Native SDKs for Python, TypeScript, Go, and Rust

For developers building agents programmatically, we ship idiomatic SDKs for all four major agent languages:

# Python — pip install dakera
from dakera import DakeraClient

client = DakeraClient(url="http://localhost:3300", api_key="your-key")

# Store memory with importance score
client.memories.store(
    agent_id="my-agent",
    content="Customer is on the Pro plan, billing contact is [email protected]",
    importance=0.9
)

# Hybrid recall (vector + BM25)
results = client.memories.recall(
    agent_id="my-agent",
    query="billing and account details",
    top_k=5
)
for mem in results:
    print(mem.content, mem.score)
Run this in your own setup — deploy in 5 min
Docker, binary, or Helm. No managed service required. LangChain, CrewAI, and AutoGen integrations available.
Quickstart guide →

Production-ready from day one

Dakera isn't a prototype. It's been running in production as the memory layer for our own autonomous agent systems since v0.4.0. The current v0.11.55 includes:

  • WAL durability with crash-safe writes
  • AES-256-GCM encryption at rest
  • Multi-tenant namespacing with per-namespace API keys
  • Rate limiting per agent and per key
  • Prometheus metrics and OpenTelemetry tracing
  • Horizontal clustering with Raft consensus
  • Zero-downtime rolling upgrades

Public alpha — The self-hosted binary is available now with no waitlist. Sign up to work directly with us and get priority access to new features as we expand the platform.

What's next

The roadmap is driven by what production agent teams actually need. We're actively working on higher LoCoMo benchmark scores across all recall categories, and deeper integrations with popular agent frameworks.

We'll be publishing technical deep-dives on this blog as we build — starting with how the hybrid retrieval engine works and why naive vector search isn't enough for production agents.

Continue reading
Engineering · 8 min
How Agent Memory Actually Works: Hybrid Retrieval and Importance Decay
Why vector-only search fails for agents, and how Dakera combines HNSW + BM25 for production-grade recall.
Benchmarks · 9 min
How We Benchmark Memory: Dakera on LoCoMo
A full walkthrough of our 87.6% LoCoMo score — the categories, the methodology, and how to reproduce the results.

Deploy persistent memory for your agents

Self-hosted is live — deploy anywhere now, no waitlist. Dakera Cloud (managed hosting, SLA) is coming next.

Get Started (Self-Host) → Join Cloud Waitlist
© 2026 DAKERA AI
Blog Docs GitHub Get Started