Features SDKs Compare
Benchmark Blog Docs
GitHub ↗ Get Started →
v0.11.55 · 87.6% LoCoMo · 1,540 questions · standard eval, no LLM post-processing
ذاكرة · Dhākira · Arabic for memory

The infrastructure layer
for production AI agents

Not just memory — the complete agent-native data stack. Vector search, hybrid retrieval, knowledge graphs, session management, and built-in embeddings in a single Rust binary. No external services. Your data stays on your stack.

87.6% LoCoMo accuracy Sub-10ms queries 83 MCP tools One binary · zero deps
agent memory
# Store agent memory
POST /v1/memory/store
{
  "agent_id": "assistant-1",
  "content": "User prefers TypeScript",
  "importance": 0.9
}

# Recall by meaning
POST /v1/memory/recall
{
  "query": "language preferences",
  "top_k": 5
}

# → Result
{ "score": 0.97, "content": "User prefers TypeScript" }
0
p99 query latency
0
MCP tools built-in
0
Native SDKs
0
Releases · v0.11.55
Works with
The problem

Your agents forget
everything they learn

Every session starts from zero. Thousands of interactions, zero retained knowledge. You're paying to re-teach your agents the same things over and over.

Sessions are isolated silos
Each conversation starts blank. Your agent can't recall what it learned yesterday, last week, or across 10,000 prior interactions.
Knowledge evaporates at scale
Insights from thousands of users vanish after each session. Your agent never compounds intelligence — it stays perpetually naive.
Context stuffing is a dead end
Cramming history into prompts burns tokens, inflates costs, and hits a hard ceiling. It's duct tape, not architecture.
agent session
agent.recall("user preferences")
Error: No memory found.
Context window empty.
agent.sessions
1,847 sessions completed
0 memories persisted
agent.monthly_cost
$4,200/mo on context stuffing
0 knowledge retained
0%
Retention
$50k
Wasted / year
Capabilities

Everything agents need
to remember

Six core capabilities that turn stateless AI into agents with genuine, compounding memory.

Vector + Hybrid Search
Find memories by meaning, not just keywords. HNSW, BM25, and hybrid search with temporal re-ranking and tunable weights.
Persistent Agent Memory
Store, recall, consolidate, and forget. Four memory types — episodic, semantic, procedural, working — with automatic importance decay.
Built-in Embeddings
Text is auto-embedded on store and query. No OpenAI calls, no external APIs. HuggingFace models ship inside the binary.
MCP Native (83 tools)
Drop into Claude, Cursor, or Windsurf instantly. Memory, search, and knowledge graph exposed as 83 callable MCP tools.
Knowledge Graph
Automatically connects related memories into a queryable graph. Entity extraction, similarity edges, cluster summaries, and semantic deduplication.
Dashboard + CLI
Visual admin dashboard for exploring memories, running queries, and monitoring agents. Plus a full dk CLI for automation.
Framework & Tool Integrations

Plug into the frameworks
you already use

Dakera ships native integrations for every major agent framework. Five lines of code to add persistent memory to your existing LangChain, LlamaIndex, CrewAI, or AutoGen pipeline.

Drop-in DakeraMemory and DakeraVectorStore classes. Your chain gets persistent cross-session memory with semantic recall in three lines.
pip install langchain-dakera
Dakera-backed VectorStore for LlamaIndex pipelines. Server-side embeddings mean zero OpenAI dependency for your RAG index.
pip install llama-index-dakera
Give your CrewAI agents a shared long-term memory store. Agents recall each other's findings across tasks — your crew compounds knowledge instead of starting fresh every run.
pip install crewai-dakera
Persistent memory across multi-agent AutoGen conversations. Each agent has its own memory namespace — shared recall, isolated writes. No conversation resets between runs.
pip install autogen-dakera
83 MCP tools available natively. One line in your IDE config and Claude, Cursor, or Windsurf gets persistent memory across every session — zero code changes required.
config-only · no code
REST API & gRPC for any language.
Native SDKs for Python, TypeScript, Go, Rust.
All integrations →
# pip install langchain-dakera dakera from langchain_dakera import DakeraMemory from langchain.chains import ConversationChain from langchain_openai import ChatOpenAI # Persistent memory backed by Dakera — survives process restarts memory = DakeraMemory( api_url="http://localhost:3300", agent_id="my-assistant", recall_k=5, ) chain = ConversationChain(llm=ChatOpenAI(), memory=memory) # → Memory persists across restarts. Agent remembers every prior conversation.
// .mcp.json — add to Claude Desktop, Cursor, or Windsurf { "mcpServers": { "dakera": { "command": "dakera", "args": ["mcp"] } } } # → Claude gets 83 memory tools. Remembers everything across all sessions.
# pip install crewai-dakera from crewai_dakera import DakeraStorage from crewai import Crew, Agent, Task # Shared memory across your entire crew — every agent reads what others stored storage = DakeraStorage(api_url="http://localhost:3300") crew = Crew( agents=[researcher, writer], tasks=[research_task, writing_task], memory=storage, ) # → writer agent recalls researcher's findings — no repeated tool calls

Already running a pipeline? Add Dakera memory in under 5 minutes.

Deploy in 5 min →
SDKs

Integrate in minutes

Native SDKs for Python, TypeScript, Go, and Rust. Plus REST and gRPC for everything else. Five lines to first memory.

Store & Recall
Semantic memory with automatic embedding and importance scoring
Session Lifecycle
Context persists across every conversation automatically
Multi-Agent
Isolated namespaces for hundreds of agents at once
MCP Ready
83 tools for Claude, Cursor, and Windsurf out of the box
from dakera import DakeraClient

client = DakeraClient(
    base_url="http://localhost:3300",
    api_key="your-key"
)

# Store agent memory
client.memories.store(
    agent_id="assistant-1",
    content="User prefers TypeScript",
    importance=0.9
)

# Recall by meaning
memories = client.memories.recall(
    agent_id="assistant-1",
    query="language preferences",
    top_k=5
)
import { DakeraClient } from "dakera"

const client = new DakeraClient({
  baseUrl: "http://localhost:3300",
  apiKey: "your-key"
})

await client.memories.store({
  agentId: "assistant-1",
  content: "User prefers TypeScript",
  importance: 0.9
})

const memories = await client.memories.recall({
  agentId: "assistant-1",
  query: "language preferences",
  topK: 5
})
import "github.com/dakera-ai/dakera-go"

client := dakera.NewClient(dakera.Config{
    BaseURL: "http://localhost:3300",
    APIKey:  "your-key",
})

client.Memories.Store(ctx, dakera.StoreMemoryRequest{
    AgentID:    "assistant-1",
    Content:    "User prefers TypeScript",
    Importance: 0.9,
})

memories, _ := client.Memories.Recall(ctx, dakera.RecallRequest{
    AgentID: "assistant-1", Query: "language preferences", TopK: 5,
})
use dakera_client::{DakeraClient, Config, StoreMemoryRequest};

let client = DakeraClient::new(Config {
    base_url: "http://localhost:3300".into(),
    api_key:  "your-key".into(),
    ..Default::default()
});

// Store agent memory
client.memories().store(StoreMemoryRequest {
    agent_id:   "assistant-1".into(),
    content:    "User prefers TypeScript".into(),
    importance: 0.9,
    ..Default::default()
}).await?;

// Recall by meaning
let memories = client
    .memories().recall("assistant-1", "language preferences", 5)
    .await?;
# Store memory
curl -X POST localhost:3300/v1/memory/store \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-key" \
  -d '{"agent_id":"assistant-1","content":"User prefers TS","importance":0.9}'

# Recall
curl -X POST localhost:3300/v1/memory/recall \
  -H "Authorization: Bearer your-key" \
  -d '{"agent_id":"assistant-1","query":"language preferences","top_k":5}'

# Text search with auto-embedding
curl -X POST localhost:3300/v1/namespaces/docs/query-text \
  -H "Authorization: Bearer your-key" \
  -d '{"text":"semantic search systems","top_k":5}'
Architecture

Five Rust crates. One binary.

6 index algorithms, 3 storage tiers, built-in ML inference, and a production-grade API layer — compiled into a single deployable artifact. 118µs queries. 27.4M inserts per second.

0
Rust crates
0
Index algorithms
0
p50 query latency
0
Peak throughput
0
Storage tiers
01
dakera-api
6 components
Production-grade REST & gRPC API layer with authentication, observability, and rate control
AxumTonicAuthPrometheusOpenTelemetry
REST API
Full CRUD with batch upsert, multi-namespace support, and streaming responses. Axum-based with tower middleware.
JSON + CBOR
gRPC
High-performance binary protocol via Tonic. Bi-directional streaming for real-time indexing and search operations.
Protobuf v3
Auth & API Keys
Multi-tenant token authentication with per-key permissions, namespace isolation, and configurable RBAC policies.
Rate Limiting
Per-key sliding window rate limiting with burst allowance. Configurable per endpoint, per namespace, or globally.
Audit Logging
Structured JSON operation logs with request tracing, latency breakdown, and compliance-ready event stream.
Prometheus + OTel
Built-in /metrics endpoint with histogram latencies, request counts, and distributed tracing via OpenTelemetry SDK.
Pull + Push
02
dakera-engine
12 components
Six index algorithms, hybrid search, auto-index selection, and distributed clustering with Raft consensus
HNSWIVFSPFreshBM25HybridRaft
HNSW
Hierarchical navigable small-world graph for sub-millisecond approximate nearest neighbor queries at scale.
8.5K qps @ 99% recall
IVF
Inverted file index with configurable nprobe for high-throughput batch indexing with tunable recall trade-offs.
877K vectors/s insert
SPFresh
Real-time streaming index optimized for continuous ingestion. LSMT-inspired design with background compaction.
27.4M inserts/s peak
PQ + SQ
Product quantization (4-16 sub-vectors) and scalar quantization for 8-32x memory compression with minimal recall loss.
8-32x compression
BM25
Full-text keyword search with configurable k1/b parameters, stemming, stop words, and multi-language tokenization.
Hybrid Search
Reciprocal Rank Fusion (RRF) combining vector similarity and keyword relevance into a single ranked result set.
RRF fusion
Auto-Index
Analyzes dataset characteristics (cardinality, dimensionality, distribution) and selects the optimal index strategy.
Agent Memory
Importance-weighted memory with consolidation, decay scoring, and semantic deduplication for AI agent workflows.
Knowledge Graph
Entity-relationship graph with typed edges, traversal queries, and automatic relationship extraction from text.
Gossip Protocol
SWIM-based protocol for cluster membership, failure detection, and metadata propagation across nodes.
Protocol: SWIM
Leader Election
Raft-based consensus for partition leader assignment, log replication, and automatic failover with quorum writes.
Raft consensus
Sharding
Consistent hashing with virtual nodes for automatic data distribution and rebalancing across cluster members.
03
dakera-inference
6 components
Rust-native ML embedding pipeline with Candle runtime — no Python, no ONNX, no external dependencies
CandleMiniLMBGEE5MetalCUDA
Candle Runtime
Pure Rust ML inference engine by Hugging Face. Zero-copy tensor ops, no Python runtime needed, WASM-compatible.
Pure Rust
MiniLM-L6
384-dim embeddings optimized for speed. Ideal for real-time agent memory with low-latency requirements.
384 dims · 22M params
BGE-Small
BAAI General Embedding for high-accuracy semantic search. Strong semantic retrieval at 33M params — BEIR MTEB benchmarks confirm competitive retrieval accuracy.
384 dims · 33M params
E5-Small
Microsoft's E5 model with instruction-tuned embeddings. Excellent for query-document asymmetric search patterns.
384 dims · 33M params
Batch Processing
Dynamic batching with configurable batch size and timeout. Amortizes model overhead for bulk ingestion workloads.
Up to 64 per batch
CPU / CUDA / Metal
Automatic hardware detection with Metal on macOS, CUDA on Linux/Windows, and optimized AVX2/NEON CPU fallback.
Auto-detect
04
dakera-storage
9 components
Three-tier persistence engine — hot memory, warm filesystem, cold S3 — with WAL durability and background compaction
MemoryFilesystemS3WALSnapshotsCompaction
Memory Tier
Lock-free concurrent hashmap with arena allocation. Sub-microsecond reads for hot data and active agent sessions.
Sub-µs reads
Filesystem Tier
Memory-mapped file storage with LSM-tree compaction. Handles datasets larger than RAM with predictable tail latency.
mmap + LSM
S3 / MinIO
Cloud object storage backend for cold data archival. Automatic tiering moves data down based on access frequency.
Auto-tier
Write-Ahead Log
Append-only WAL with fsync durability guarantees. Crash recovery replays log to reconstruct consistent state.
fsync durable
Snapshots
Point-in-time consistent snapshots with copy-on-write semantics. Export to local disk or stream directly to S3.
Compaction
Background merge of sorted runs with configurable size ratios. Reclaims space from tombstones and overwrites.
Delta Encoding
Stores only vector deltas for versioned data. Reduces storage by 40-70% for frequently updated embeddings.
40-70% savings
TTL
Per-record and per-namespace time-to-live with lazy expiration. Background sweeper reclaims expired entries.
Encryption at Rest
AES-256-GCM encryption for filesystem and S3 tiers. Key rotation support with zero-downtime re-encryption.
AES-256-GCM
05
dakera-common
6 components
Shared type system, error taxonomy, configuration, and cross-crate utilities used by all other crates
TypesErrorsConfigSerdeValidation
Shared Types
Strongly-typed domain models for vectors, memories, namespaces, and search results. Zero-cost serde serialization.
Error Taxonomy
Hierarchical error types with context propagation, HTTP status mapping, and structured error responses for clients.
Configuration
Layered config from defaults → TOML → env vars → CLI flags. Hot reload for runtime-tunable parameters.
Hot reload
Validation
Input validation with dimension checks, UTF-8 enforcement, payload size limits, and custom constraint rules.
Serialization
Zero-copy deserialization with serde. Supports JSON, CBOR, MessagePack, and custom binary format for vectors.
Zero-copy
Telemetry
Shared tracing subscriber with span propagation, structured logging (JSON + pretty), and metric type definitions.
Ecosystem
MCP Server · 83 tools CLI · dk Dashboard · Leptos Python SDK TypeScript SDK Go SDK Rust SDK
How it works

Three steps to persistent intelligence

From raw conversation to compounding knowledge — your agent's memory grows with every interaction.

01
Store
Your agent stores conversations, decisions, and preferences as embedded memories — each with an importance score and type label. Embeddings happen automatically inside the binary.
memory.store("User prefers TypeScript", importance=0.9)
Auto-embeddingImportance scoring4 memory types
02
Recall
Before each response, the agent retrieves the most relevant memories — combining vector similarity, keyword matching, and graph traversal into a single ranked result.
memory.recall("language preferences", top_k=5)
Hybrid search<10ms p99Graph traversal
03
Learn
Over time, overlapping memories merge automatically. Importance decays, facts deduplicate, and related concepts connect. Your agent builds compounding intelligence — not a growing pile of text.
memory.consolidate("agent-1", strategy="merge")
Auto-consolidationImportance decayDeduplication
Use Cases

What developers build
with persistent memory

From solo agent projects to production multi-agent pipelines — here's exactly what becomes possible when your agents remember.

Agents with persistent memory
A customer support agent learns user preferences on session 1 and applies them on session 100 — without any re-prompting. Dakera stores, recalls, and consolidates knowledge across every conversation automatically.
Session memory Auto-consolidation
Multi-agent knowledge pipelines
A researcher agent stores findings to Dakera; a writer agent recalls them by meaning in the next task — no context passing, no redundant tool calls. Your crew shares a living knowledge base, not just message history.
Shared namespace CrewAI AutoGen
RAG with decay-weighted recall
A research assistant surfaces fresh sources and deprioritizes stale ones — automatically. Dakera's decay engine reduces the importance of old memories over time, so your retrieval stays relevant without manual curation.
Importance decay Hybrid search
Chatbots that remember preferences
A product chatbot recalls that this user prefers detailed explanations, dislikes upsells, and last asked about billing — across sessions, weeks apart. Personalization that compounds without any prompt engineering.
User profiles Cross-session recall
Copilots that learn your workflows
A developer copilot that knows your codebase naming conventions, your team's architecture decisions, and which library patterns you actually use — accumulated silently from every session via MCP. No onboarding docs, no prompt files.
MCP tools IDE native
Team memory for LLM dev tools
Your internal LLM tooling accumulates institutional knowledge — decisions made, patterns adopted, incidents resolved. New engineers query the same memory store that experienced ones have been building. Onboarding as a side-effect of usage.
Knowledge graph Multi-tenant
Who builds with Dakera

Built for the engineers who
ship production agents

Not a tool for demos. Dakera is built for developers who are deploying intelligent agents into production and need real infrastructure underneath.

AI / ML Engineers
Building production agent pipelines
You ship LangChain or AutoGen agents that need to remember state across thousands of sessions — without duct-taping Redis, Pinecone, and a custom decay script together.
One binary replaces your entire memory stack — vector store, embeddings, session store, knowledge graph
87.6% LoCoMo recall accuracy with hybrid retrieval you can tune per query type
Native integrations: langchain-dakera, crewai-dakera, autogen-dakera — drop-in memory classes
Backend Engineers
Adding memory to LLM features
You're adding an AI feature to an existing product and need a reliable memory layer — not a research project. You care about latency, auth, multi-tenancy, and zero new infra to maintain.
REST API + gRPC: integrate from any language in under an hour, no Python runtime required
Namespace isolation per user, key-based auth, and rate limiting built-in — production-ready on day one
Sub-10ms p99 query latency — won't add latency to your LLM call chain
Framework Builders
Integrating LangChain, CrewAI, or LlamaIndex
You build tools on top of agent frameworks and need a memory backend that works across all of them — consistent API, framework-agnostic, and fast enough for tool-calling loops.
Identical REST/gRPC API across Python, TypeScript, Go, and Rust SDKs
MCP protocol for LLM tool integration — same backend serves IDE, API, and framework use cases
Open core: integrate the public API surface without worrying about vendor lock-in on internals
Platform Teams
Deploying agent infrastructure at scale
You run the platform that dozens of internal teams build agents on. You need multi-tenancy, observability, horizontal scaling, and security posture — not a managed service with opaque pricing.
One instance serves hundreds of agents — namespaced, rate-limited, and AES-256-GCM encrypted
Prometheus metrics + OpenTelemetry tracing out of the box — plug into your existing stack
Raft consensus clustering: add nodes, data rebalances automatically — no manual sharding

Your team. Your infrastructure. No managed service required.

Read the docs → Deploy in 5 min ↗
Architecture

One binary.
Everything included.

Most memory setups require assembling multiple services. Dakera ships embeddings, vector indexing, knowledge graph, and session storage in a single Rust binary — zero external dependencies required.

Typical setup
Vector store
Embedding service
Knowledge graph
Session store
~1–2 GB · 3–5 services
Dakera
dakera
~44 MB · 1 binary
LoCoMo Benchmark
87.6%
Long-context memory accuracy — standard industry evaluation across 1,540 questions
Dakera scores 87.6% on the full LoCoMo dataset (50 sessions, 1,540 questions) without LLM post-processing — the standard benchmark for long-context agent recall across temporal, multi-hop, entity, and implicit reasoning.
Full benchmark results → Methodology
Capability Built in
Runtime Rust, single binary
Embedding models Candle — on-device, no API calls
Index algorithms HNSW, IVF, SPFresh, BM25, Hybrid
MCP server 83 tools, native
Knowledge graph Built-in, auto-extraction
Tiered storage Memory → Filesystem → S3/MinIO
External dependencies Zero
Open core

Open at the edges.
Closed at the core.

We open everything you need to integrate. We keep what makes us fast.

Open — MIT Licensed
pip install dakera
npm install dakera
go get github.com/dakera-ai/dakera-go
Add dakera to Cargo.toml
Shell-scriptable admin and query interface
83 tools for Claude, Cursor, Windsurf
All repos on GitHub →
Closed — Proprietary
Memory Engine dakera
The Rust server: HNSW+BM25 hybrid retrieval, importance decay, knowledge graphs, AES-256 encryption, Raft clustering. Provided as a binary and Docker image. Source is not public.
Dashboard dakera-dashboard
Web UI for monitoring agents, sessions, memory health, and real-time analytics. Proprietary.
You can self-host the engine. The binary is yours to run on your own infrastructure — no phone-home, no external dependencies. What's closed is the source code, not your right to deploy it.
Full breakdown →
Dual Offering

Self-hosted is live.
Cloud is coming next.

Self-hosted is live — deploy anywhere now, no waitlist. Dakera Cloud is coming next: managed hosting, SLA, and team monitoring. Join the waitlist to lock in founder pricing.

WAL-durableHorizontal scalingAES-256-GCMMulti-tenant
FAQ

Common
questions

Everything you need to know about Dakera. Can't find what you're looking for? Reach out on GitHub.

Ask on GitHub
Is Dakera a vector database?
No. Dakera is an AI agent memory platform. It gives your agents persistent, session-aware, cross-agent memory with intelligent importance decay. The underlying retrieval engine (HNSW, IVF, BM25, hybrid search) is just how memories are recalled fast — the product is agents that remember, not a database you query.
Do I need an OpenAI API key for embeddings?
No. Text is embedded automatically on store and query using built-in models (MiniLM, BGE, E5) powered by the Candle runtime. No external calls, no additional cost.
Is Dakera production-ready?
Yes. WAL durability, snapshots, AES-256-GCM encryption, multi-tenant auth, rate limiting, Prometheus, and OpenTelemetry are all included. Designed for production from day one.
Can I use it with Claude, Cursor, or Windsurf?
Yes. Dakera ships as a native MCP server with 83 tools. Add it to your Claude Desktop config, Cursor settings, or Windsurf configuration. Your AI assistant gets persistent memory across all sessions — zero code changes required.
What does Dakera replace in my stack?
Dakera replaces the entire memory infrastructure layer: vector store, embedding service, knowledge graph, and session store — all compiled into one binary. No Docker Compose, no external API keys, no separate services to operate.
How does Dakera handle scaling?
A single Dakera instance handles millions of vectors comfortably. For horizontal scaling, Dakera supports distributed clustering with Raft consensus, consistent-hash sharding, and automatic rebalancing. Add nodes — the data redistributes automatically.
What languages and SDKs are supported?
Native SDKs for Python, TypeScript, Go, and Rust. Plus a REST API (JSON) and gRPC (Protobuf) for any other language. MCP protocol for AI tool integration. Five lines of code to store your first memory.
Is Dakera open source or open core?
Open core, not fully open source. The integration layer — SDKs (Python, TypeScript, Go, Rust), CLI, and MCP server — is MIT-licensed and on GitHub. The memory engine is proprietary: you self-host the binary on your own infrastructure with full data ownership, but the engine source is not public. Commercial-friendly: no usage fees on self-hosted deployments. Full breakdown →
Is Dakera self-hostable?
Yes. Dakera ships as a single binary or Docker image with zero external runtime dependencies. Your data never leaves your infrastructure. Pull the image, set DAKERA_API_KEY, and you're live in under a minute on any Linux server or Kubernetes cluster.
When will Dakera be generally available?
Dakera is live in public alpha. You can deploy the self-hosted binary today — pull the Docker image, set an API key, and you're running in under a minute. Dakera Cloud (managed hosting, SLA, team monitoring) is coming next — join the waitlist below to lock in founder pricing.
What's the pricing model?
The self-hosted binary is free to run on your own infrastructure — no usage fees, no call-home. Dakera Cloud (managed hosting, SLA, team monitoring) is priced separately — join the waitlist to lock in founder pricing before public launch.
Is Dakera an MCP memory server?
Yes. Dakera is a self-hosted MCP memory server with 83 built-in MCP tools — store, recall, hybrid search, knowledge graph, sessions, decay control, namespaces, and consolidation. Connect it to Claude Desktop, Cursor, or Windsurf with one config block. No code changes required. All memory stays on your infrastructure.
From the blog

Engineering insights

View all posts →
Engineering 2026-05-07 New
Dakera as an MCP Memory Server: 83 tools for Persistent Agent Memory
Every Dakera instance ships a built-in MCP server. Connect Claude Desktop, Cursor, or any MCP agent and get full persistent memory — store, recall, knowledge graphs, sessions, and decay — with one config block.
Read post
Engineering 2026-05-06
How Agent Memory Actually Works: Hybrid Retrieval and Importance Decay
A technical look at how Dakera combines HNSW vector search with BM25 full-text search, why naive cosine similarity fails for agent workloads, and how decay keeps recall sharp.
Read post
Benchmarks 2026-05-07 New
How We Benchmark Memory: Dakera on LoCoMo
A complete breakdown of Dakera's 87.6% LoCoMo score — the four question categories, methodology, and how to run the evaluation against your own instance.
Read post