Intermediate Retrieval

Cross-Session Context

Bridge the gap between disconnected conversations. Let your agent pick up exactly where it left off — days or weeks later — without asking users to repeat themselves.

⏱ ~20 min to implement 📦 Requires: Dakera v0.11+

Start Building →

Prerequisites

Running Dakera server (Quickstart guide)
Understanding of session start/end lifecycle
Familiarity with episodic vs. semantic memory types

The Problem

Modern AI assistants are amnesiac by default. A customer opens a support ticket on Monday, describes their entire infrastructure setup, gets partial help, and closes the chat. On Wednesday they return with a follow-up. From the AI's perspective, it is a complete stranger. The customer must re-explain their stack, their constraints, and the context of the problem — a deeply frustrating experience that destroys trust.

The underlying cause is that LLM context windows only span a single session. Appending entire conversation histories to every request is token-expensive, hits limits fast, and fills the context with noise rather than signal. What you need is a curated context bridge: a compact, high-signal summary of prior sessions that can be recalled in milliseconds.

Session vs. Memory: Key Distinction

Dakera sessions track when interactions happened — they are time-bounded containers. Memories are the durable, queryable facts extracted from those sessions. This pattern uses both: sessions to provide temporal structure, and memories to store the cross-session context that persists indefinitely.

Architecture: Session Timeline with Context Bridging

This pattern builds a session bridge: at the end of each session, key decisions, open questions, and state changes are distilled and stored as memories. At the start of any new session, a targeted recall surfaces the most relevant prior context, rehydrating the agent's working memory in under 20 ms.

Step-by-Step Implementation

Design your agent ID scheme

Use a stable, per-user agent ID like support-customer-{customerId}. This is the persistent identity that all session memories accumulate under. Do not use session-scoped IDs for the agent — those would prevent cross-session retrieval entirely.
Recall prior context at session start

Before the user's first message hits your LLM, fire a recall with a broad query ("recent project progress, open tasks, constraints") and inject the results into your system prompt. Use min_importance=0.72 to capture medium-confidence context without pulling in low-signal memories.
Select what to store during the session

Not every exchange warrants storage. Focus on: decisions made, open problems not yet resolved, technical constraints stated, and state transitions (completed X, started Y). Skip: pleasantries, clarifying back-and-forth, and content the user will re-state naturally next time.
Store granular facts at session end

Store 3–6 separate memories rather than one large summary blob. Each memory retrieves independently, so granular facts score more precisely against specific queries. Assign importance 0.75–0.90 based on how critical the fact is for future sessions.
Decay old sessions with summarization

After 20+ sessions, older memories become noisy. Pair this pattern with Summarization & Decay to consolidate old sessions into a compact historical summary and reduce the importance of archived memories so fresh context always wins.

Implementation

# --- END OF SESSION 1 ---
# Store distilled context: one memory per key fact

curl -X POST http://localhost:3300/v1/memory/store   -H "Authorization: Bearer dk-..."   -H "Content-Type: application/json"   -d '{
    "agent_id": "support-customer-9182",
    "content": "Customer runs FastAPI backend with PostgreSQL 14 on AWS ECS.",
    "importance": 0.85,
    "memory_type": "episodic",
    "tags": ["session-context", "infrastructure"]
  }'

curl -X POST http://localhost:3300/v1/memory/store   -H "Authorization: Bearer dk-..."   -H "Content-Type: application/json"   -d '{
    "agent_id": "support-customer-9182",
    "content": "Auth module complete (JWT HS256 + refresh tokens). Next open task: rate limiting middleware.",
    "importance": 0.90,
    "memory_type": "episodic",
    "tags": ["session-context", "progress", "open-task"]
  }'

curl -X POST http://localhost:3300/v1/memory/store   -H "Authorization: Bearer dk-..."   -H "Content-Type: application/json"   -d '{
    "agent_id": "support-customer-9182",
    "content": "Constraint: must stay within AWS free tier. No external paid rate-limiting services.",
    "importance": 0.82,
    "memory_type": "semantic",
    "tags": ["session-context", "constraint", "budget"]
  }'

# --- START OF SESSION 2 (2 days later) ---
curl "http://localhost:3300/v1/memory/recall?agent_id=support-customer-9182&query=project+progress+open+tasks+constraints&top_k=8&min_importance=0.72"   -H "Authorization: Bearer dk-..."

from dakera import DakeraClient

client = DakeraClient(base_url="http://localhost:3300", api_key="dk-...")

class CrossSessionAgent:
    def __init__(self, customer_id: str):
        self.agent_id = f"support-customer-{customer_id}"

    def start_session(self, user_message: str) -> str:
        """Recall prior context and build system prompt."""
        prior = client.recall(
            agent_id=self.agent_id,
            query=f"project progress, open tasks, decisions, constraints: {user_message}",
            top_k=8,
            min_importance=0.72
        )
        if not prior.get("memories"):
            return "You are a helpful support agent. First conversation with this customer."

        ctx_lines = "
".join(f"- {m['content']}" for m in prior["memories"])
        return (
            "You are a support agent with memory of prior sessions.

"
            f"PRIOR SESSION CONTEXT:
{ctx_lines}

"
            "Continue from where we left off. Do not ask the user to repeat themselves."
        )

    def end_session(self, outcomes: list) -> None:
        """Store distilled session outcomes for future recall."""
        for o in outcomes:
            client.store_memory(
                agent_id=self.agent_id,
                content=o["content"],
                memory_type="episodic",
                importance=o.get("importance", 0.80),
                tags=o.get("tags", ["session-context"])
            )

    def get_session_list(self) -> list:
        """List all sessions for this agent."""
        return client.list_sessions()

# --- SESSION 1 (Monday) ---
agent = CrossSessionAgent("9182")
print(agent.start_session("help with FastAPI"))
# Output: first-conversation message

# ... LLM conversation happens ...

agent.end_session([
    {"content": "Customer runs FastAPI + PostgreSQL 14 on AWS ECS",
     "importance": 0.85, "tags": ["infrastructure"]},
    {"content": "Auth module complete (JWT HS256). Rate limiting is next.",
     "importance": 0.90, "tags": ["progress", "open-task"]},
    {"content": "Constraint: AWS free tier only, no paid services",
     "importance": 0.82, "tags": ["constraint"]},
])

# --- SESSION 2 (Wednesday, new process) ---
agent2 = CrossSessionAgent("9182")
system2 = agent2.start_session("I am back to add rate limiting")
print(system2)
# Output includes all 3 prior facts, agent picks up seamlessly

import { DakeraClient } from '@dakera-ai/dakera';

const client = new DakeraClient({ baseUrl: 'http://localhost:3300', apiKey: 'dk-...' });

interface SessionOutcome {
  content: string;
  importance?: number;
  tags?: string[];
}

class CrossSessionAgent {
  private agentId: string;

  constructor(customerId: string) {
    this.agentId = `support-customer-${customerId}`;
  }

  async startSession(userMessage: string): Promise<string> {
    const prior = await client.recall(
      this.agentId,
      `project progress, open tasks, decisions, constraints: ${userMessage}`,
      { top_k: 8, min_importance: 0.72 }
    );

    if (!prior.memories.length) {
      return 'You are a helpful support agent. First conversation with this customer.';
    }

    const ctxLines = prior.memories.map(m => `- ${m.content}`).join('
');
    return `You are a support agent with memory of prior sessions.

PRIOR CONTEXT:
${ctxLines}

Continue seamlessly without asking the user to repeat information.`;
  }

  async endSession(outcomes: SessionOutcome[]): Promise<void> {
    // Parallel store for speed
    await Promise.all(outcomes.map(o =>
      client.storeMemory(this.agentId, {
        content: o.content,
        memoryType: 'episodic',
        importance: o.importance ?? 0.80,
        tags: o.tags ?? ['session-context']
      })
    ));
  }

  async getSessionHistory() {
    return client.listSessions();
  }
}

// SESSION 1
const agent = new CrossSessionAgent('9182');
const sys1 = await agent.startSession('help with FastAPI project');

// ... conversation ...

await agent.endSession([
  { content: 'Customer runs FastAPI + PostgreSQL 14 on AWS ECS', importance: 0.85, tags: ['infrastructure'] },
  { content: 'Auth complete (JWT HS256). Rate limiting is next open task.', importance: 0.90, tags: ['open-task'] },
  { content: 'Constraint: AWS free tier only, no paid external services', importance: 0.82, tags: ['constraint'] },
]);

// SESSION 2 (Wednesday)
const agent2 = new CrossSessionAgent('9182');
const sys2 = await agent2.startSession('back to add rate limiting');
// sys2 contains all prior context — user never has to repeat themselves

use dakera_rs::{Client, StoreMemoryRequest, RecallRequest};

let client = Client::new("http://localhost:3300", "dk-...");
let agent_id = "support-customer-9182";

// --- END OF SESSION 1: store distilled facts ---
let outcomes = vec![
    ("Customer runs FastAPI + PostgreSQL 14 on AWS ECS", 0.85f32, vec!["infrastructure"]),
    ("Auth module complete (JWT HS256). Rate limiting is next open task.", 0.90, vec!["open-task"]),
    ("Constraint: AWS free tier only, no paid external services.", 0.82, vec!["constraint"]),
];

for (content, importance, tags) in &outcomes {
    client.store_memory(agent_id, StoreMemoryRequest {
        content: content.to_string(),
        memory_type: "episodic".into(),
        importance: Some(*importance),
        tags: tags.iter().map(|t| t.to_string()).collect(),
        ..Default::default()
    }).await?;
}

// --- START OF SESSION 2: recall and rehydrate ---
let prior = client.recall(agent_id, RecallRequest {
    query: "project progress open tasks constraints".into(),
    top_k: Some(8),
    min_importance: Some(0.72),
    ..Default::default()
}).await?;

let ctx_lines: Vec<String> = prior.memories
    .iter()
    .map(|m| format!("- {}", m.content))
    .collect();

let system_prompt = format!(
    "You are a support agent with prior session memory.

PRIOR CONTEXT:
{}

Continue seamlessly.",
    ctx_lines.join("
")
);

package main

import (
    "context"
    "fmt"
    "strings"
    dakera "github.com/dakera-ai/dakera-go"
)

func endSession(ctx context.Context, client *dakera.Client, agentID string, outcomes []map[string]interface{}) {
    for _, o := range outcomes {
        importance, _ := o["importance"].(float64)
        tags, _ := o["tags"].([]string)
        client.StoreMemory(ctx, agentID, dakera.StoreMemoryRequest{
            Content:    o["content"].(string),
            MemoryType: "episodic",
            Importance: importance,
            Tags:       tags,
        })
    }
}

func startSession(ctx context.Context, client *dakera.Client, agentID, userMsg string) string {
    prior, _ := client.Recall(ctx, agentID, dakera.RecallRequest{
        Query:         fmt.Sprintf("project progress open tasks constraints: %s", userMsg),
        TopK:          8,
        MinImportance: 0.72,
    })

    if len(prior.Memories) == 0 {
        return "You are a helpful support agent. First conversation with this customer."
    }

    lines := make([]string, 0, len(prior.Memories))
    for _, m := range prior.Memories {
        lines = append(lines, "- "+m.Content)
    }

    return fmt.Sprintf(
        "You are a support agent with memory of prior sessions.

PRIOR CONTEXT:
%s

Continue without asking the user to repeat themselves.",
        strings.Join(lines, "
"),
    )
}

func main() {
    client := dakera.NewClient("http://localhost:3300", "dk-...")
    ctx := context.Background()
    agentID := "support-customer-9182"

    // SESSION 2: recall prior context
    system := startSession(ctx, client, agentID, "I am back to add rate limiting")
    _ = system // inject into LLM call
}

Stop making your users repeat themselves.

Ship cross-session memory in one afternoon with Dakera's session and recall APIs.

Get Started →

Before & After: Memory State

Contrast the agent's context at the start of Session 2 with and without this pattern.

Before: Stateless Agent (Session 2)

{
  // Agent has NO memory of Session 1.
  // System prompt is generic.
  // Agent asks user to re-explain stack.

  system_prompt: "You are a helpful
    support agent.",

  // User is forced to repeat:
  // "I told you on Monday I use
  //  FastAPI and PostgreSQL..."

  memories_recalled: 0,
  session_bridged: false
}

After: Cross-Session Context (Session 2)

{
  // Agent recalls 3 memories from
  // Session 1 instantly.

  system_prompt: "You are a support agent
    with prior session memory.

    PRIOR CONTEXT:
    - FastAPI + PostgreSQL 14, AWS ECS
    - Auth done. Rate limiting is next.
    - Constraint: AWS free tier only",

  memories_recalled: 3,
  recall_latency_ms: 14,
  session_bridged: true
  // User says "I'm back for rate
  // limiting" — agent starts coding.
}

Real-World Example: Long-Running Customer Support Relationship

Scenario: Helios Support AI powers enterprise customer support for a developer tooling company. Customers frequently open multiple tickets across weeks or months as they work through large integrations. Without cross-session context, every ticket starts from scratch. A customer who has been using the product for 6 months must still explain their AWS architecture on every call.

The 6-Month Customer Journey

Month 1, Session 1: Customer (enterprise account, 50-seat team) explains they run a microservices architecture on Kubernetes with 12 services. The agent stores: Kubernetes microservices, 12 services, GKE cluster.

Month 2, Session 4: Customer reports a performance issue with the search service. The agent recalls their full stack context instantly and asks targeted questions about search service load rather than generic "what is your setup?" questions. Session concludes with root cause: N+1 query pattern. Stored: Search service had N+1 query bug, resolved by dataloader in service #7.

Month 4, Session 11: Customer asks about upgrading from v2 to v3 API. The agent recalls all prior context — their 12-service architecture, the dataloader fix in service #7, their Kubernetes setup — and proactively flags that the v3 migration has a breaking change that will affect their search service implementation. The customer calls this the most impressive AI support experience they have had. They renew their annual contract that week.

Pro Tip: Store Decisions, Not Conversations

The biggest mistake with cross-session context is storing raw conversation turns. A 40-turn conversation produces 40 noisy memories that compete with each other during recall. Instead, distill each session into 3–6 high-signal facts: what was decided, what was completed, what is still open. The distillation step — whether done by LLM or rule-based logic — is what makes cross-session recall precise rather than overwhelming.

Context Bridge: Session-to-Session Handoff Diagram

This diagram shows how distilled session memories flow through the context bridge, enabling seamless continuity across disconnected conversations.

Performance Characteristics

<20ms

Session context recall p99 (top_k=8)

~3–6

Optimal memories to store per session

500+

Sessions per user supported without decay

Cross-session recall adds 10–20 ms to session startup. This is negligible compared to LLM inference latency (200–2000 ms). Run the recall in parallel with session initialization rather than sequentially after the first user message arrives. At 500+ sessions per user, consider applying the Summarization & Decay pattern to compress historical memories and keep retrieval fast.

Edge Cases & Developer Gotchas

Gotcha 1: Session End Hook Not Firing

Users frequently close browser tabs or apps without a formal session end event. If your end_session() logic only fires on clean disconnects, you will lose context from abruptly terminated sessions. Solution: Store context incrementally during the session (after each significant exchange), not just at the end. Use a webhook or background job to handle orphaned sessions.

Gotcha 2: Context Injection Inflates Token Count

Injecting 8 memories averaging 80 tokens each adds 640 tokens to every session. At scale this adds meaningful cost. Solution: Use min_importance to gate which memories enter the system prompt. A threshold of 0.80 typically yields 3–5 memories, keeping injection under 400 tokens. Use top_k=3 for cost-sensitive deployments.

Gotcha 3: Stale Technical Context After Migrations

A customer migrated from PostgreSQL to MySQL last month, but the memory still says "PostgreSQL." The agent confidently gives wrong advice. Solution: When a state change is detected ("we just switched from X to Y"), explicitly call update_memory or forget on the outdated fact. Add state-change detection to your memory extraction logic as a first-class concern.

Gotcha 4: Query Specificity Affects Recall Quality

Using a generic query like "recent context" returns the top memories by recency-weighted score, which may not be the most relevant memories for the current conversation topic. Solution: Incorporate the user's opening message into the recall query. If the user asks about billing, query for "billing, payment, subscription context." This dramatically improves precision.

Gotcha 5: Recalling Another User's Context

If you build agent IDs dynamically and make a bug like "support-customer-" + req.customerId with an unvalidated input, you risk recalling the wrong customer's context. Solution: Always validate and sanitize the customer ID before constructing the agent ID. Combine with Namespace Isolation for hard tenant boundaries at the infrastructure level.

SDK Reference

Operation	Python	TypeScript	Purpose
Store session fact	`client.store_memory(agent_id, content, importance, memory_type, tags)`	`client.storeMemory(agentId, {content, importance, memoryType, tags})`	Persist a session outcome
Recall prior context	`client.recall(agent_id, query, top_k, min_importance)`	`client.recall(agentId, query, {top_k, min_importance})`	Rehydrate cross-session context
List sessions	`client.list_sessions()`	`client.listSessions()`	Get session history for user
Get session memories	`client.session_memories(session_id)`	`client.sessionMemories(sessionId)`	Retrieve all memories from a session
Update stale fact	`client.update_memory(agent_id, memory_id, ...)`	`client.updateMemory(agentId, memoryId, request)`	Correct outdated session context
Delete outdated fact	`client.forget(agent_id, memory_id)`	`client.forget(agentId, memoryId)`	Remove a superseded memory
Batch recall (multi-topic)	`client.batch_recall(request)`	`client.batchRecall(request)`	Fan out multiple recall queries in one round-trip

Advanced Configuration

Parallel Multi-Topic Recall at Session Start

When you need context across multiple domains simultaneously, batch the recall to avoid sequential round-trips:

results = client.batch_recall({
    "queries": [
        {"agent_id": "support-customer-9182", "query": "infrastructure stack architecture", "top_k": 4},
        {"agent_id": "support-customer-9182", "query": "open tasks and unresolved issues", "top_k": 4},
        {"agent_id": "support-customer-9182", "query": "technical constraints and budget", "top_k": 3}
    ]
})
# Merge all results into system prompt

Automatic Session Summarization on High Volume

When a customer exceeds 50 sessions, automatically trigger a historical summarization:

session_count = len(client.list_sessions())
if session_count > 50:
    # Use LLM to summarize oldest 30 sessions
    old_memories = client.recall(
        agent_id=agent_id,
        query="oldest project history and decisions",
        top_k=40,
        min_importance=0.0  # include all
    )
    # Store summary, forget originals
    summary = llm_summarize(old_memories)
    client.store_memory(agent_id, summary, importance=0.6,
                        memory_type="semantic", tags=["historical-summary"])
    for m in old_memories["memories"][:30]:
        client.forget(agent_id, m["id"])

TTL on Transient Session Context

Some session context is only relevant for a short time window. Use TTL to auto-expire it:

client.store_memory(
    agent_id=agent_id,
    content="Customer is attending our live webinar this week",
    importance=0.70,
    memory_type="episodic",
    tags=["transient", "event"],
    ttl_seconds=604800  # auto-expires in 7 days
)

When to Use This Pattern

Customer support AIs serving recurring customers over weeks or months
Project management assistants tracking long-running work items
Sales AIs that remember prospect context across multiple touchpoints
Onboarding assistants that track setup progress across multiple sessions
Coding assistants working on large, multi-session feature implementations
Any product where "the AI remembers your situation" is a retention driver

Build AI that remembers every customer relationship

Cross-session memory is the foundation of truly helpful AI. Ship it today with Dakera.

Read the Quickstart → API Reference