Intermediate Architecture

RAG-Augmented Memory

~35 min to implement 📦 Requires: Dakera v0.11+

Pure RAG knows your documents but forgets your users. Pure memory knows your users but not your documents. Combine both in a single retrieval pipeline so your agent is simultaneously accurate and personal.

Get Started Free →
Prerequisites
  • Running Dakera server (see Quickstart guide)
  • Document chunking pipeline or existing vector store with content to index
  • LLM API access (Anthropic Claude, OpenAI GPT-4, etc.) for response generation
  • Dakera Python, TypeScript, Rust, or Go SDK installed

The Problem with Choosing Between RAG and Memory

Traditional RAG retrieves document chunks on every query — great for factual accuracy over a static knowledge base, but it has no awareness of who is asking, what they asked before, or what they already know. Conversely, a pure persistent memory system is personal and accumulates context over time, but has no access to your internal documentation or knowledge base.

Enterprise knowledge base assistants need both. A support agent should know that this particular user has a Pro subscription, already tried rebooting, and prefers concise answers — while simultaneously retrieving the correct troubleshooting steps from official documentation.

Dakera's approach: one store, two retrieval paths

Rather than maintaining a separate vector store for documents and a memory system for users, Dakera indexes both in the same store. Document chunks are stored as memories with metadata marking their source. The recall() call returns agent memories and document chunks ranked together by relevance — no merge code to write.

Architecture: Hybrid Retrieval Pipeline

On every query, two parallel retrieval paths execute and their results are ranked together before injection into the LLM prompt. This diagram shows the full pipeline from user query to LLM response:

User Query "How do I enable SSO for my plan?" Memory Recall recall(agent_id, query) user prefs · history · context ~12ms p50 Doc Retrieval recall(agent_id, query, memory_type="doc") ~80ms p50 (chunked) Merge + Rank deduplicate · score-rank token budget trim ~5ms p50 Prompt Builder inject context LLM response ~600ms New facts learned → store_memory() feedback loop Total pipeline p50: ~97ms (memory + doc parallel) · p99: ~190ms

Latency Breakdown

12ms
Memory recall (agent history + prefs) p50
80ms
Document chunk retrieval p50
5ms
Merge, deduplicate, rank results

Memory Update Loop: Learning from RAG Results

When the RAG pipeline surfaces a fact new to the agent's memory — a policy change, a product update, a user correction — that fact should be stored back into agent memory so future queries answer faster without re-retrieval. This "memory crystallization" loop is the key differentiator of RAG-augmented memory over plain RAG.

RAG Result doc chunk returned by recall() Novel fact? YES store_memory() importance=0.75 memory_type="semantic" tags=["learned", "doc-derived"] NO Pass through only no memory write Dakera Memory Fact crystallized for future recall Next query answers from memory — no doc retrieval needed Auto-decay importance drops if doc is updated or TTL expires Memory Crystallization Loop — new facts from RAG become persistent agent knowledge
Tip: use semantic deduplication before crystallizing

Before storing a learned fact to memory, call search_memories() to check if a similar fact already exists. Dakera's hybrid search will surface near-duplicates with a cosine similarity score. Only store if the top result scores below 0.85 — this prevents memory bloat from near-identical document chunks being crystallized repeatedly.

Real-World Scenario: Enterprise Knowledge Base Assistant

An internal knowledge base assistant at a SaaS company handles HR policy questions, IT support, and product documentation queries from employees. The critical requirement: every answer must be both factually accurate (from official docs) and personally aware (remembering this employee's role, past questions, and preferences).

  1. Index company documentation at startup
    Chunk all policy documents, runbooks, and product guides into 400-600 token segments. Store each chunk as a memory with memory_type="semantic" and metadata marking the source document, version, and department. Dakera indexes them in the same store as user memories.
  2. Store user context on first interaction
    When an employee first uses the assistant, store their role, department, technical level, and communication preferences as high-importance semantic memories scoped to their agent ID. These persist across all future sessions.
  3. Run parallel retrieval on every query
    Execute two concurrent recall() calls: one filtered to user memories (preferences, past questions, their specific environment), and one filtered to documentation chunks. Run them in parallel — total added latency is the slower of the two, not the sum.
  4. Merge, rank, and trim to token budget
    Sort all retrieved results by relevance score. Apply token budget (typically 2000 tokens for context). Prioritize user-memory results over generic doc chunks when scores are close — personalization wins ties.
  5. Crystallize new facts learned from successful answers
    After the LLM generates a high-confidence answer that includes a specific policy fact, store that condensed fact back into the user's memory with TTL matching the document's expected update frequency. Future identical or similar questions skip doc retrieval entirely.

Ship a RAG+Memory pipeline in under an hour

Dakera handles chunked doc indexing, semantic recall, and memory persistence in one self-hosted API.

Get Started →

Implementation

# 1. Index a document chunk as a memory
curl -X POST http://localhost:3300/v1/memory/store \
  -H "Authorization: Bearer dk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "kb-assistant",
    "content": "SSO is available on Pro and Enterprise plans. To enable, go to Settings > Security > Single Sign-On and upload your IdP metadata XML.",
    "importance": 0.85,
    "memory_type": "semantic",
    "tags": ["doc", "sso", "security", "settings"],
    "metadata": {
      "source": "doc",
      "doc_id": "help-sso-setup",
      "version": "2024-Q4",
      "department": "IT"
    }
  }'

# 2. Store user context
curl -X POST http://localhost:3300/v1/memory/store \
  -H "Authorization: Bearer dk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "kb-assistant",
    "content": "User is a Systems Administrator on the Pro plan. Prefers step-by-step instructions with CLI examples where available.",
    "importance": 0.95,
    "memory_type": "semantic",
    "tags": ["user-profile", "user:alice"]
  }'

# 3. Recall combines both — sorted by relevance
curl "http://localhost:3300/v1/memory/recall?agent_id=kb-assistant&query=how+do+I+enable+SSO&top_k=8" \
  -H "Authorization: Bearer dk-..."

# 4. Recall only docs (for explicit doc-only path)
curl "http://localhost:3300/v1/memory/recall?agent_id=kb-assistant&query=SSO+setup&top_k=5&tags=doc" \
  -H "Authorization: Bearer dk-..."
import asyncio
from dakera import DakeraClient

client = DakeraClient(base_url="http://localhost:3300", api_key="dk-...")

# ── Step 1: Index documents at startup ──────────────────────────────────────

def index_document(doc_id: str, chunks: list[str], department: str, version: str):
    """Index a document as a series of memory chunks."""
    for i, chunk in enumerate(chunks):
        client.store_memory(
            agent_id="kb-assistant",
            content=chunk,
            importance=0.82,
            memory_type="semantic",
            tags=["doc", department.lower()],
            # TTL: auto-expire doc chunks after 90 days, force re-index on update
            ttl_seconds=90 * 24 * 3600
        )

# Index your SSO documentation
sso_chunks = [
    "SSO (Single Sign-On) is available on Pro and Enterprise plans only. Starter plan users cannot enable SSO.",
    "To enable SSO, navigate to Settings > Security > Single Sign-On. Upload your IdP metadata XML. Supported IdPs: Okta, Azure AD, Google Workspace, OneLogin.",
    "After uploading IdP metadata, test the SSO flow with a sandbox account before enabling for your entire organization. SSO enforcement will lock out users without IdP accounts.",
]
index_document("help-sso-setup", sso_chunks, "IT", "2024-Q4")

# ── Step 2: Store user preferences on first interaction ─────────────────────

client.store_memory(
    agent_id="kb-assistant",
    content="Alice Chen is a Systems Administrator on the Pro plan. Prefers step-by-step instructions. Has SAML experience. Uses Okta as IdP.",
    importance=0.95,
    memory_type="semantic",
    tags=["user-profile", "user:alice"]
)

# ── Step 3: Parallel retrieval on each query ────────────────────────────────

async def retrieve_context(query: str, user_tag: str) -> dict:
    """Run memory recall and doc recall in parallel."""
    # Both calls to the same Dakera instance; run concurrently
    mem_task = asyncio.to_thread(
        client.recall,
        agent_id="kb-assistant",
        query=query,
        top_k=4,
        min_importance=0.7,
        tags=[user_tag]  # user-specific memories only
    )
    doc_task = asyncio.to_thread(
        client.recall,
        agent_id="kb-assistant",
        query=query,
        top_k=6,
        min_importance=0.6,
        tags=["doc"]  # documentation chunks only
    )
    mem_results, doc_results = await asyncio.gather(mem_task, doc_task)
    return {"memories": mem_results["memories"], "docs": doc_results["memories"]}

# ── Step 4: Merge and rank results ──────────────────────────────────────────

def merge_context(memories: list, docs: list, token_budget: int = 2000) -> str:
    """Merge memory + doc results, rank by score, trim to token budget."""
    # Tag source for the LLM
    tagged = [
        {"content": m["content"], "score": m["score"], "source": "memory"}
        for m in memories
    ] + [
        {"content": d["content"], "score": d["score"], "source": "doc"}
        for d in docs
    ]
    # Sort by score descending; user memories win ties (memory source sorted first)
    tagged.sort(key=lambda x: (x["score"], x["source"] == "memory"), reverse=True)

    context_parts = []
    token_count = 0
    for item in tagged:
        tokens = len(item["content"].split()) * 1.3  # rough estimate
        if token_count + tokens > token_budget:
            break
        prefix = "[USER CONTEXT]" if item["source"] == "memory" else "[DOCUMENTATION]"
        context_parts.append(f"{prefix}
{item['content']}")
        token_count += tokens

    return "

".join(context_parts)

# ── Step 5: Crystallize learned facts back to memory ────────────────────────

def crystallize_fact(agent_id: str, fact: str, user_tag: str, ttl_days: int = 30):
    """Store a learned fact from RAG results into persistent memory."""
    # Check for near-duplicate first
    existing = client.search_memories(agent_id=agent_id, query=fact)
    if existing["memories"] and existing["memories"][0]["score"] > 0.85:
        return  # Already known — skip

    client.store_memory(
        agent_id=agent_id,
        content=fact,
        importance=0.75,
        memory_type="semantic",
        tags=["learned", "doc-derived", user_tag],
        ttl_seconds=ttl_days * 24 * 3600
    )

# Example usage
async def answer_question(user_query: str):
    context_data = await retrieve_context(user_query, "user:alice")
    context_str = merge_context(context_data["memories"], context_data["docs"])

    # Build system prompt with merged context
    system_prompt = f"""You are a knowledge base assistant.
Use the context below to answer accurately and personally.

{context_str}"""

    # ... call your LLM here ...
    # After response, crystallize high-value facts:
    for doc in context_data["docs"]:
        if doc["score"] > 0.90:  # High-confidence relevant doc chunk
            crystallize_fact("kb-assistant", doc["content"], "user:alice")
import { DakeraClient } from '@dakera-ai/dakera';
import Anthropic from '@anthropic-ai/sdk';

const client = new DakeraClient({ baseUrl: 'http://localhost:3300', apiKey: 'dk-...' });
const anthropic = new Anthropic();

// ── Index documentation chunks ──────────────────────────────────────────────

async function indexDocument(docId: string, chunks: string[], tags: string[]) {
  await Promise.all(chunks.map((chunk, i) =>
    client.storeMemory('kb-assistant', {
      content: chunk,
      importance: 0.82,
      memoryType: 'semantic',
      tags: ['doc', ...tags],
      ttl_seconds: 90 * 24 * 3600,
    })
  ));
}

// Index SSO docs
await indexDocument('help-sso-setup', [
  'SSO is available on Pro and Enterprise plans. Go to Settings > Security > SSO to enable.',
  'Supported IdPs: Okta, Azure AD, Google Workspace, OneLogin. Upload your IdP metadata XML.',
  'Test with a sandbox account before enforcing SSO organization-wide to avoid lockouts.',
], ['it', 'sso', 'security']);

// ── Store user profile ──────────────────────────────────────────────────────

await client.storeMemory('kb-assistant', {
  content: 'Alice Chen is a Systems Administrator on Pro plan. Uses Okta IdP. Prefers step-by-step instructions.',
  importance: 0.95,
  memoryType: 'semantic',
  tags: ['user-profile', 'user:alice'],
});

// ── Parallel retrieval ──────────────────────────────────────────────────────

async function retrieveContext(query: string, userTag: string) {
  const [memResults, docResults] = await Promise.all([
    client.recall('kb-assistant', query, { top_k: 4, min_importance: 0.7, memory_type: 'semantic' }),
    client.recall('kb-assistant', query, { top_k: 6, min_importance: 0.6, memory_type: 'semantic' }),
  ]);

  return {
    memories: memResults.memories.filter((m: any) => m.tags?.includes(userTag)),
    docs: docResults.memories.filter((m: any) => m.tags?.includes('doc')),
  };
}

// ── Merge results ───────────────────────────────────────────────────────────

function mergeContext(
  memories: any[],
  docs: any[],
  tokenBudget = 2000
): string {
  const tagged = [
    ...memories.map(m => ({ ...m, source: 'memory' as const })),
    ...docs.map(d => ({ ...d, source: 'doc' as const })),
  ].sort((a, b) => b.score - a.score);

  const parts: string[] = [];
  let tokens = 0;
  for (const item of tagged) {
    const est = item.content.split(' ').length * 1.3;
    if (tokens + est > tokenBudget) break;
    const prefix = item.source === 'memory' ? '[USER CONTEXT]' : '[DOCUMENTATION]';
    parts.push(`${prefix}
${item.content}`);
    tokens += est;
  }
  return parts.join('

');
}

// ── Crystallize facts ───────────────────────────────────────────────────────

async function crystallizeFact(fact: string, userTag: string) {
  const existing = await client.searchMemories('kb-assistant', fact, { top_k: 1 });
  if (existing.memories[0]?.score > 0.85) return; // Already known

  await client.storeMemory('kb-assistant', {
    content: fact,
    importance: 0.75,
    memoryType: 'semantic',
    tags: ['learned', 'doc-derived', userTag],
    ttl_seconds: 30 * 24 * 3600,
  });
}

// ── Full pipeline ───────────────────────────────────────────────────────────

async function answerQuestion(userQuery: string) {
  const { memories, docs } = await retrieveContext(userQuery, 'user:alice');
  const context = mergeContext(memories, docs);

  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    system: `You are a knowledge base assistant. Use the context below.

${context}`,
    messages: [{ role: 'user', content: userQuery }],
  });

  // Crystallize highly relevant doc chunks for faster future recall
  for (const doc of docs.filter((d: any) => d.score > 0.90)) {
    await crystallizeFact(doc.content, 'user:alice');
  }

  return response.content[0].type === 'text' ? response.content[0].text : '';
}
use dakera_rs::{Client, StoreMemoryRequest, RecallRequest};
use tokio::join;

let client = Client::new("http://localhost:3300", "dk-...");

// Index document chunks
client.store_memory("kb-assistant", StoreMemoryRequest {
    content: "SSO is available on Pro and Enterprise plans. Settings > Security > SSO.".into(),
    importance: Some(0.82),
    memory_type: Some("semantic".into()),
    tags: Some(vec!["doc".into(), "sso".into()]),
    ttl_seconds: Some(90 * 24 * 3600),
    ..Default::default()
}).await?;

// Store user context
client.store_memory("kb-assistant", StoreMemoryRequest {
    content: "Alice is a sysadmin on Pro plan, uses Okta, prefers step-by-step guides.".into(),
    importance: Some(0.95),
    memory_type: Some("semantic".into()),
    tags: Some(vec!["user-profile".into(), "user:alice".into()]),
    ..Default::default()
}).await?;

// Parallel recall: user memories + doc chunks
let query = "How do I enable SSO?";
let (mem_results, doc_results) = join!(
    client.recall("kb-assistant", RecallRequest {
        query: query.into(),
        top_k: Some(4),
        min_importance: Some(0.7),
        ..Default::default()
    }),
    client.recall("kb-assistant", RecallRequest {
        query: query.into(),
        top_k: Some(6),
        min_importance: Some(0.6),
        ..Default::default()
    })
);

let (mem, docs) = (mem_results?, doc_results?);

// Build context string
let mut context = String::new();
for m in &mem.memories {
    context.push_str(&format!("[USER CONTEXT]
{}

", m.content));
}
for d in &docs.memories {
    context.push_str(&format!("[DOCUMENTATION]
{}

", d.content));
}

println!("Context for LLM:
{}", context);
package main

import (
    "context"
    "fmt"
    "sync"
    dakera "github.com/dakera-ai/dakera-go"
)

func main() {
    client := dakera.NewClient("http://localhost:3300", "dk-...")
    ctx := context.Background()

    // Index documentation chunk
    client.StoreMemory(ctx, "kb-assistant", dakera.StoreMemoryRequest{
        Content:    "SSO available on Pro and Enterprise. Settings > Security > SSO. Upload IdP metadata XML.",
        Importance: 0.82,
        MemoryType: "semantic",
        Tags:       []string{"doc", "sso"},
        TTLSeconds: 90 * 24 * 3600,
    })

    // Store user profile
    client.StoreMemory(ctx, "kb-assistant", dakera.StoreMemoryRequest{
        Content:    "Alice Chen: sysadmin, Pro plan, Okta IdP, prefers step-by-step instructions.",
        Importance: 0.95,
        MemoryType: "semantic",
        Tags:       []string{"user-profile", "user:alice"},
    })

    // Parallel retrieval
    var (
        memResults *dakera.RecallResponse
        docResults *dakera.RecallResponse
        wg         sync.WaitGroup
    )
    query := "How do I enable SSO?"

    wg.Add(2)
    go func() {
        defer wg.Done()
        memResults, _ = client.Recall(ctx, "kb-assistant", dakera.RecallRequest{
            Query:         query,
            TopK:          4,
            MinImportance: 0.7,
        })
    }()
    go func() {
        defer wg.Done()
        docResults, _ = client.Recall(ctx, "kb-assistant", dakera.RecallRequest{
            Query:         query,
            TopK:          6,
            MinImportance: 0.6,
        })
    }()
    wg.Wait()

    // Build merged context
    contextStr := ""
    for _, m := range memResults.Memories {
        contextStr += fmt.Sprintf("[USER CONTEXT]
%s

", m.Content)
    }
    for _, d := range docResults.Memories {
        contextStr += fmt.Sprintf("[DOCUMENTATION]
%s

", d.Content)
    }
    fmt.Println("Merged context:", contextStr)
}

Before / After: Pure RAG vs. RAG + Dakera Memory

Before: Pure RAG only
Query: "How do I enable SSO?"

Retrieved chunks:
- SSO setup guide (generic)
- SSO troubleshooting (generic)
- IdP metadata format spec

Generated answer:
"To enable SSO, go to Settings >
Security > Single Sign-On and
upload your IdP metadata XML.
Supported IdPs: Okta, Azure AD,
Google Workspace, OneLogin."

Problems:
- Doesn't know Alice is already
  on Pro plan (no upsell friction)
- Doesn't know she uses Okta
  (misses Okta-specific steps)
- Same generic answer every time
- No memory of past questions
  (may answer the same thing 5x)
After: RAG + Dakera Memory
Query: "How do I enable SSO?"

Memory recall:
- Alice: Pro plan, Okta IdP,
  prefers step-by-step guides
- Alice asked about SSO last week;
  showed her the Settings page

Doc recall:
- SSO setup guide (generic)
- Okta-specific SAML config steps
- Pro plan feature confirmation

Generated answer:
"Since you're on Pro plan, SSO is
available. For Okta specifically:
1. In Okta Admin, create a new
   SAML 2.0 application
2. Download the metadata XML
3. In Dakera Settings > Security >
   SSO, upload the XML
Last time we spoke about this you
were on the Settings page — the
SSO tab is in the left nav."

Personal, accurate, and contextual.

Edge Cases

1. Conflicting RAG vs. Memory Facts

A policy document might say "SSO requires Enterprise plan" while a crystallized memory from 6 months ago says "SSO is available on Pro plan" (because the policy changed). When scores are close and sources conflict, always prefer the document chunk — it reflects the current source of truth. Flag the stale memory for update using update_importance() to demote it.

# Detect conflict: doc says X, memory says Y on same topic
# Demote stale memory importance so doc wins future rankings
client.update_importance(
    agent_id="kb-assistant",
    memory_id="mem_stale_sso_policy",
    importance=0.2  # demote so doc chunk ranks higher
)

2. Document Staleness After Updates

Documents change. Crystallized memories derived from outdated chunks persist until their TTL expires or you force-update them. Set TTL on all doc-derived memories to match your document update cadence. For fast-changing docs (weekly releases), use 7-day TTLs. For stable policy docs, 90 days is safe.

3. Chunking Strategy Affects Recall Quality

Chunks that are too small (under 100 tokens) lack enough context for accurate semantic matching. Chunks that are too large (over 800 tokens) dilute the embedding and return irrelevant sentences. The optimal range is 350–550 tokens with 50-token overlaps between adjacent chunks to prevent context loss at boundaries.

4. Token Budget Overflow

When both memory recall and doc retrieval return high-scoring results, the merged context can overflow the LLM's context window. Implement strict token budgets: allocate 40% to user memories (high personalization value), 50% to doc chunks (factual grounding), and reserve 10% for the system prompt and user message.

5. Cold Start: No User Memory Yet

On a user's first interaction, memory recall returns nothing. Fall back gracefully to doc-only retrieval with a wider top_k. After the first turn, store a minimal user profile memory to bootstrap personalization for the second turn. Never show degraded behavior to the user — the transition should be seamless.

Performance Considerations

Operationp50p99Optimization
Memory recall (user context)12ms28msTag filter reduces search space
Doc chunk recall (1k chunks indexed)80ms160msHybrid BM25+vector search
Doc chunk recall (50k chunks indexed)95ms220msSub-linear scaling with HNSW index
Parallel recall (both paths)82ms165msParallelism eliminates additive cost
Merge + deduplicate + rank5ms12msIn-process; no network hop
Crystallization (store_memory)18ms40msAsync post-response for zero UX impact
Crystallize asynchronously

Always run the crystallization store_memory() call after returning the response to the user, not before. It adds 18-40ms to the user-facing latency if you block on it. Queue it as a background task — the user should never wait for memory writes.

SDK Reference

MethodSDKPurpose
store_memory(agent_id, content, importance, memory_type, tags, ttl_seconds)PythonIndex doc chunks or store user memories
storeMemory(agentId, {content, importance, memoryType, tags, ttl_seconds})TypeScriptIndex doc chunks or store user memories
recall(agent_id, query, top_k, min_importance)PythonRetrieve ranked memories + doc chunks
recall(agentId, query, {top_k, min_importance, memory_type})TypeScriptRetrieve ranked memories + doc chunks
search_memories(agent_id, query)PythonSemantic search to detect near-duplicates before crystallization
searchMemories(agentId, query, {top_k})TypeScriptSemantic search for deduplication check
update_importance(agent_id, memory_id, importance)PythonDemote stale crystallized facts when docs are updated
updateImportance(agentId, request)TypeScriptDemote stale memories
forget(agent_id, memory_id)PythonRemove outdated doc chunks when document is replaced
batch_recall(request)PythonRecall from multiple agent IDs in one call for multi-user scenarios
Advanced Configuration: Chunking, TTL, and Index Tuning

Recommended chunking parameters

from dakera import DakeraClient
import tiktoken

client = DakeraClient(base_url="http://localhost:3300", api_key="dk-...")
enc = tiktoken.get_encoding("cl100k_base")

def chunk_document(text: str, chunk_tokens: int = 450, overlap_tokens: int = 50):
    """Chunk text with overlap to prevent context boundary loss."""
    tokens = enc.encode(text)
    chunks = []
    start = 0
    while start < len(tokens):
        end = min(start + chunk_tokens, len(tokens))
        chunk_text = enc.decode(tokens[start:end])
        chunks.append(chunk_text)
        start += chunk_tokens - overlap_tokens  # slide with overlap
    return chunks

TTL strategy by document type

# Changelog / release notes: short TTL — changes frequently
client.store_memory("kb-assistant", content=chunk, importance=0.8, ttl_seconds=7*24*3600)

# Policy documents: medium TTL — quarterly updates
client.store_memory("kb-assistant", content=chunk, importance=0.85, ttl_seconds=90*24*3600)

# Core product docs: long TTL — stable unless major version change
client.store_memory("kb-assistant", content=chunk, importance=0.85, ttl_seconds=365*24*3600)

# Crystallized facts learned from high-score RAG results: 30 days
client.store_memory("kb-assistant", content=fact, importance=0.75, ttl_seconds=30*24*3600)

Hybrid search tuning

# In docker/.env: tune BM25 vs vector weight for doc retrieval
# Higher BM25 weight = better for keyword-heavy technical docs
# Higher vector weight = better for conversational / semantic queries
DAKERA_HYBRID_BM25_WEIGHT=0.4
DAKERA_HYBRID_VECTOR_WEIGHT=0.6

Combine your docs with your users' context

Dakera makes it trivial to index document chunks alongside agent memories and retrieve both in a single ranked call — no separate vector store required.

Deploy Dakera Free →