Intermediate Lifecycle

Conversation Summarization & Decay

⏰ ~25 min to implement 📦 Requires: Dakera v0.11+

Transform long conversation histories into compact, persistent summaries while raw transcripts decay naturally. Ideal for legal, support, and long-running project workflows where outcomes matter more than raw dialogue.

Start Free →
Prerequisites
  • Running Dakera server (Quickstart)
  • An agent ID scoped per user or case
  • Optional: LLM client (OpenAI, Anthropic) for custom summarization prompts

The Problem: Unbounded Conversation Growth

Raw conversation logs are verbose. A single 90-minute legal consultation can generate 8,000+ tokens of dialogue. Storing every turn verbatim creates three compounding problems: recall noise (low-value messages compete with critical facts for retrieval slots), unbounded storage growth, and context window bloat when reconstructing agent state.

Human memory solves this naturally — you remember that a client wants to sue for wrongful termination, not the exact words they used in the second sentence of the third paragraph. Your AI agent needs the same ability: compress the transcript, persist the insight.

Pattern Context

This pattern works on two timescales simultaneously: real-time decay (raw messages fade as importance drops) and batch summarization (periodic consolidation compresses accumulated turns into a durable summary). Use both together for the best results.

Architecture Overview

The pattern combines two Dakera primitives: importance-weighted storage and built-in consolidation. Raw conversation turns are stored with low importance (0.2–0.4) so they decay and become invisible to recall within days or weeks. LLM-generated summaries are stored with high importance (0.85–0.95) and persist indefinitely.

  • Store each conversation turn immediately with memory_type: "episodic" and low importance
  • After every N turns (typically 10–20), generate an LLM summary and store it at high importance
  • Tag summaries with "summary" and "session-{id}" for targeted recall
  • Call consolidate() to let Dakera's autopilot merge related low-importance memories automatically
  • At recall time, use min_importance=0.7 to surface only summaries and key facts

Diagram: Summarization Pipeline

CONVERSATION TURNS Turn 1 — importance: 0.3 Turn 2 — importance: 0.2 Turn 3 — importance: 0.4 Turn 4 — importance: 0.25 Turn N... decaying raw turns fade over time LLM summarize GPT / Claude prompt template store SUMMARY MEMORY importance: 0.92 memory_type: semantic tags: ["summary","case-247"] ttl: none (permanent) persists across sessions Recall min=0.7 consolidate() merges automatically

Diagram: Decay Curves by Importance Level

1.0 0.7 0.4 0.1 now 1 week 1 month 3 months time since storage → effective importance 0.92 (summary) 0.75 (key fact) 0.45 (working note) 0.25 (raw turn — fast decay) min_importance=0.4 recall threshold

Real-World Scenario: Legal Case Assistant

Scenario: Lawbridge AI builds a legal assistant that attends client consultations. Each consultation generates 50–200 conversation turns. Attorneys need the case summary to persist across months; small talk and pleasantries should vanish within days.

Lawbridge stores each consultation turn at importance 0.25–0.35. After each consultation, they run a GPT-4 summarization prompt that extracts: client claims, relevant dates, requested remedies, and attorney notes. That summary is stored at importance 0.92 with tags ["summary", "case-{id}", "client-{id}"]. When an attorney opens a case six months later, recall(min_importance=0.8) returns only summaries — no raw dialogue noise.

Result: attorneys retrieve complete case context in under 120ms, with storage costs reduced by 87% vs. storing raw transcripts.

Step-by-Step Implementation

  1. Store each conversation turn with low importance
    As the conversation progresses, write each user/assistant turn to Dakera with memory_type: "episodic" and importance 0.2–0.4. These are raw, verbose, and will decay. Do not set a TTL — let importance-based decay handle removal naturally.
  2. Detect summarization trigger
    Trigger summarization after every 15–20 turns, at session end, or when a topic shift is detected. For legal workflows, always summarize at the explicit close of each consultation, not mid-conversation.
  3. Generate summary with an LLM
    Retrieve recent turns using search_memories or by recalling with low min_importance. Pass them to your LLM with a structured extraction prompt: "Extract key facts, decisions, dates, and action items." Validate the output before storing.
  4. Store the summary at high importance
    Store the LLM summary with importance: 0.9, memory_type: "semantic", and structured tags. High importance ensures these memories survive decay indefinitely and surface first in recall.
  5. Call consolidate() to trigger autopilot merging
    After storing the summary, call consolidate(). Dakera's autopilot scans for overlapping low-importance episodic memories and merges them into higher-level representations, further compressing the store.
  6. Recall with min_importance filter
    At the start of any future session, recall with min_importance=0.7 and relevant tags. This returns only durable summaries and key facts — no ephemeral noise from old conversations.

Before & After: Memory State

Before — raw turns (6 memories)
[
  {
    "id": "m-001",
    "content": "Client: Hi, thanks for seeing me",
    "importance": 0.2,
    "memory_type": "episodic",
    "created_at": "2026-03-10T09:00:00Z"
  },
  {
    "id": "m-002",
    "content": "Attorney: Of course, what brings you in?",
    "importance": 0.15,
    "memory_type": "episodic"
  },
  {
    "id": "m-003",
    "content": "Client: I was fired last Tuesday...",
    "importance": 0.35,
    "memory_type": "episodic"
  },
  {
    "id": "m-004",
    "content": "...after 8 years at the company.",
    "importance": 0.35,
    "memory_type": "episodic"
  },
  {
    "id": "m-005",
    "content": "Client: No severance was offered.",
    "importance": 0.4,
    "memory_type": "episodic"
  },
  {
    "id": "m-006",
    "content": "Attorney: Was there a written contract?",
    "importance": 0.25,
    "memory_type": "episodic"
  }
]
// Recall noise: 6 low-importance entries
// compete with critical facts
After — summarized (1 memory)
[
  {
    "id": "m-sum-001",
    "content": "CASE SUMMARY — March 10 2026: Client
Jane Doe, employed at Acme Corp for 8
years, terminated March 4 2026 without
cause or severance. No written contract.
Client seeks wrongful termination claim.
Attorney requested: employment records,
last 3 pay stubs, termination letter.
Next step: review docs by March 20.",
    "importance": 0.92,
    "memory_type": "semantic",
    "tags": [
      "summary",
      "case-247",
      "client-jane-doe",
      "wrongful-termination"
    ],
    "created_at": "2026-03-10T10:15:00Z"
  }
]
// Clean recall: 1 high-signal memory
// surfaces immediately at min_imp=0.8

Implementation

# 1. Store a conversation turn (low importance — decays naturally)
curl -X POST http://localhost:3300/v1/memory/store   -H "Authorization: Bearer dk-..."   -H "Content-Type: application/json"   -d '{
    "agent_id": "legal-assistant",
    "content": "Client was terminated March 4 after 8 years. No severance.",
    "importance": 0.35,
    "memory_type": "episodic",
    "tags": ["turn", "case-247"]
  }'

# 2. Store LLM-generated summary (high importance — persists permanently)
curl -X POST http://localhost:3300/v1/memory/store   -H "Authorization: Bearer dk-..."   -H "Content-Type: application/json"   -d '{
    "agent_id": "legal-assistant",
    "content": "CASE SUMMARY — Jane Doe v Acme Corp: wrongful termination, 8yr tenure, no severance, no written contract. Docs requested: employment records, pay stubs, termination letter. Review by March 20.",
    "importance": 0.92,
    "memory_type": "semantic",
    "tags": ["summary", "case-247", "client-jane-doe"]
  }'

# 3. Trigger autopilot consolidation
curl -X POST http://localhost:3300/v1/agents/legal-assistant/consolidate   -H "Authorization: Bearer dk-..."   -H "Content-Type: application/json"   -d '{}'

# 4. Recall only summaries at next session
curl "http://localhost:3300/v1/memory/recall?agent_id=legal-assistant&query=Jane+Doe+case+details&min_importance=0.8&top_k=5"   -H "Authorization: Bearer dk-..."
from dakera import DakeraClient
import openai

client = DakeraClient(base_url="http://localhost:3300", api_key="dk-...")
oai = openai.OpenAI()

AGENT = "legal-assistant"
CASE_ID = "case-247"
SUMMARY_TRIGGER = 15  # summarize every 15 turns

conversation_turns = []

def store_turn(role: str, content: str):
    """Store a single conversation turn at low importance."""
    turn_text = f"{role.upper()}: {content}"
    conversation_turns.append(turn_text)
    client.store_memory(
        agent_id=AGENT,
        content=turn_text,
        importance=0.3,
        memory_type="episodic",
        tags=["turn", CASE_ID]
    )
    # Check summarization trigger
    if len(conversation_turns) % SUMMARY_TRIGGER == 0:
        summarize_and_store()

def summarize_and_store():
    """Generate an LLM summary and store at high importance."""
    turns_text = "
".join(conversation_turns[-SUMMARY_TRIGGER:])
    response = oai.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": "You are a legal case summarizer. Extract: client name, key claims, relevant dates, requested documents, and next steps. Be concise and factual."
        }, {
            "role": "user",
            "content": f"Summarize this consultation segment:

{turns_text}"
        }]
    )
    summary = response.choices[0].message.content

    # Store summary with high importance — survives indefinitely
    client.store_memory(
        agent_id=AGENT,
        content=f"CASE SUMMARY [{CASE_ID}]: {summary}",
        importance=0.92,
        memory_type="semantic",
        tags=["summary", CASE_ID, "auto-generated"]
    )
    # Trigger Dakera's consolidation to clean up raw turns
    client.consolidate(agent_id=AGENT)

def recall_case_context(query: str):
    """Recall only persistent summaries, filtering out raw turns."""
    return client.recall(
        agent_id=AGENT,
        query=query,
        min_importance=0.8,
        top_k=5
    )

# --- Simulation ---
store_turn("client", "I was terminated on March 4 after 8 years. No severance was offered.")
store_turn("attorney", "Was there a written employment contract?")
store_turn("client", "No written contract. I was at-will.")
# ... more turns ...

# At session end: force final summarization regardless of trigger
summarize_and_store()

# Next session — retrieve only clean summaries
context = recall_case_context("Jane Doe wrongful termination case")
for mem in context.memories:
    print(f"[{mem.importance:.2f}] {mem.content[:120]}")
import { DakeraClient } from '@dakera-ai/dakera';
import OpenAI from 'openai';

const client = new DakeraClient({ baseUrl: 'http://localhost:3300', apiKey: 'dk-...' });
const oai = new OpenAI();

const AGENT = 'legal-assistant';
const CASE_ID = 'case-247';
const SUMMARY_TRIGGER = 15;

const conversationTurns: string[] = [];

async function storeTurn(role: 'client' | 'attorney', content: string) {
  const turnText = `${role.toUpperCase()}: ${content}`;
  conversationTurns.push(turnText);

  await client.storeMemory(AGENT, {
    content: turnText,
    importance: 0.3,
    memoryType: 'episodic',
    tags: ['turn', CASE_ID]
  });

  if (conversationTurns.length % SUMMARY_TRIGGER === 0) {
    await summarizeAndStore();
  }
}

async function summarizeAndStore() {
  const recentTurns = conversationTurns.slice(-SUMMARY_TRIGGER).join('
');
  const response = await oai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: 'You are a legal case summarizer. Extract: client name, key claims, relevant dates, documents requested, and next steps. Be concise.'
      },
      { role: 'user', content: `Summarize this consultation:

${recentTurns}` }
    ]
  });

  const summary = response.choices[0].message.content ?? '';

  // High importance summary — persists indefinitely
  await client.storeMemory(AGENT, {
    content: `CASE SUMMARY [${CASE_ID}]: ${summary}`,
    importance: 0.92,
    memoryType: 'semantic',
    tags: ['summary', CASE_ID, 'auto-generated']
  });

  // Trigger consolidation to merge raw turns
  await client.consolidate(AGENT);
}

async function recallCaseContext(query: string) {
  return client.recall(AGENT, query, {
    min_importance: 0.8,
    top_k: 5
  });
}

// Usage
await storeTurn('client', 'I was terminated on March 4 after 8 years. No severance.');
await storeTurn('attorney', 'Was there a written contract?');
await storeTurn('client', 'No written contract. I was at-will.');

// Force final summary at session end
await summarizeAndStore();

// Retrieve clean context next session
const context = await recallCaseContext('Jane Doe wrongful termination');
context.memories.forEach(m => console.log(`[${m.importance}] ${m.content.slice(0, 120)}`));
use dakera_rs::{Client, StoreMemoryRequest, RecallRequest};

let client = Client::new("http://localhost:3300", "dk-...");
let agent = "legal-assistant";

// Store raw conversation turns at low importance
client.store_memory(agent, StoreMemoryRequest {
    content: "CLIENT: Terminated March 4 after 8 years. No severance offered.".into(),
    importance: Some(0.3),
    memory_type: "episodic".into(),
    tags: vec!["turn".into(), "case-247".into()],
    ..Default::default()
}).await?;

// Store LLM-generated summary at high importance
let summary = generate_case_summary(&turns).await?; // your LLM call
client.store_memory(agent, StoreMemoryRequest {
    content: format!("CASE SUMMARY [case-247]: {}", summary),
    importance: Some(0.92),
    memory_type: "semantic".into(),
    tags: vec!["summary".into(), "case-247".into()],
    ..Default::default()
}).await?;

// Trigger consolidation via REST (no direct Rust SDK method)
// POST /v1/agents/legal-assistant/consolidate

// Recall only high-importance summaries next session
let context = client.recall(agent, RecallRequest {
    query: "Jane Doe wrongful termination case details".into(),
    min_importance: Some(0.8),
    top_k: Some(5),
    ..Default::default()
}).await?;

for mem in &context.memories {
    println!("[{:.2}] {}", mem.importance, &mem.content[..120.min(mem.content.len())]);
}
package main

import (
    "context"
    "fmt"
    dakera "github.com/dakera-ai/dakera-go"
)

func main() {
    client := dakera.NewClient("http://localhost:3300", "dk-...")
    ctx := context.Background()
    agent := "legal-assistant"
    caseID := "case-247"

    // Store raw turn at low importance
    client.StoreMemory(ctx, agent, dakera.StoreMemoryRequest{
        Content:    "CLIENT: Terminated March 4 after 8 years. No severance.",
        Importance: 0.3,
        MemoryType: "episodic",
        Tags:       []string{"turn", caseID},
    })

    // After summarization, store at high importance
    summary := generateCaseSummary(turns) // your LLM call
    client.StoreMemory(ctx, agent, dakera.StoreMemoryRequest{
        Content:    fmt.Sprintf("CASE SUMMARY [%s]: %s", caseID, summary),
        Importance: 0.92,
        MemoryType: "semantic",
        Tags:       []string{"summary", caseID},
    })

    // Trigger consolidation
    client.Consolidate(ctx, agent, dakera.ConsolidateRequest{})

    // Recall only summaries at next session
    result, _ := client.Recall(ctx, agent, dakera.RecallRequest{
        Query:         "Jane Doe wrongful termination case",
        MinImportance: 0.8,
        TopK:          5,
    })

    for _, mem := range result.Memories {
        fmt.Printf("[%.2f] %s
", mem.Importance, mem.Content[:min(120, len(mem.Content))])
    }
}

Reduce memory storage by up to 87%

Dakera's consolidation engine runs server-side — no infrastructure to manage.

Deploy Free →

SDK Reference

MethodSDKPurpose
store_memory(agent_id, content, importance, memory_type, tags)PythonStore turn or summary with importance score
storeMemory(agentId, {content, importance, memoryType, tags})TypeScriptStore turn or summary with importance score
store_memory("agent", StoreMemoryRequest{...}).await?RustStore turn or summary with importance score
StoreMemory(ctx, "agent", StoreMemoryRequest{...})GoStore turn or summary with importance score
recall(agent_id, query, min_importance, top_k)PythonRetrieve summaries above importance threshold
recall(agentId, query, {min_importance, top_k})TypeScriptRetrieve summaries above importance threshold
search_memories(agent_id, query)PythonFull-text search for recent turns before summarizing
searchMemories(agentId, query, {top_k})TypeScriptFull-text search for recent turns before summarizing
POST /v1/agents/{id}/consolidateRESTTrigger autopilot to merge low-importance episodic memories

Performance Considerations

87%
Storage reduction vs. raw transcript retention
<120ms
Recall latency for filtered summary queries (p95)
3–5s
Consolidation runtime for 500 episodic memories
  • Summarization is the bottleneck, not storage. LLM calls for summarization take 1–4s. Run them asynchronously — store the raw turns immediately, summarize in a background job, and update the store when complete.
  • Recall with min_importance is fast. The importance filter is applied at the index level before semantic scoring, not post-retrieval. Filtering to importance > 0.8 can reduce candidate set by 70–90%, cutting recall latency significantly.
  • Consolidate in off-peak windows. Consolidation scans all memories for the agent and is O(N log N). For agents with 10,000+ memories, schedule it during low-traffic periods (e.g., overnight).
  • Chunk very long consultations. If a consultation exceeds 50 turns, generate intermediate summaries every 20 turns rather than waiting until the end. This prevents the LLM context window from being overwhelmed.

Edge Cases

Edge Case 1: Summarization Failure Mid-Session

If the LLM summarization call fails (timeout, rate limit), the raw turns remain at low importance. Always implement a retry with exponential backoff and a fallback: store the raw turns concatenated as a summary at importance 0.7 if the LLM fails after 3 attempts. Never skip summary storage — raw turns will decay before the next session.

Edge Case 2: Summarizing Before Enough Context Exists

Triggering summarization on fewer than 5 turns produces shallow summaries like "User greeted attorney." Guard against this with a minimum turn count of 8–10 before the first summarization trigger fires. Check len(conversation_turns) >= min_turns before calling the LLM.

Edge Case 3: Duplicate Summaries from Retries

If a session crashes after storing the summary but before clearing the turn buffer, a retry will generate and store a second summary covering the same turns. Deduplicate by tagging summaries with a deterministic ID: sha256(case_id + turn_range), then check for existing summaries with that tag before storing.

Edge Case 4: Recalling Raw Turns for Audit

Legal and compliance workflows sometimes need the original verbatim transcript, not just summaries. Store raw turns in a separate namespace (e.g., legal-assistant-audit) at importance 1.0 with a long TTL, outside the decay pipeline. The main agent namespace can decay freely while audit history is preserved.

Edge Case 5: Multi-Topic Sessions

If a single consultation covers multiple unrelated topics (e.g., client discusses both a contract dispute and estate planning), generate separate summaries per topic rather than one combined summary. Use topic detection (keyword-based or LLM-based) to split the turn buffer before summarizing. Tag each summary with its topic for cleaner recall.

Advanced Configuration: Decay Strategies & Summarization Tuning

Importance Levels by Memory Type

Memory TypeRecommended ImportanceExpected Lifetime
Raw greeting / small talk0.1–0.2Hours to 2 days
Conversation turn (factual)0.3–0.43–10 days
Working note / partial summary0.5–0.62–4 weeks
Key fact / decision point0.7–0.8Months
LLM-generated summary0.85–0.95Permanent (no decay)
Identity / case constants1.0Never decays

Summarization Prompt Template

SYSTEM: You are a precise case-note summarizer for legal consultations.
Extract only facts stated by the client. Include:
1. Client full name (if mentioned)
2. Key legal claim(s) and relevant dates
3. Evidence or documents discussed
4. Attorney's requested next steps
5. Any deadlines or court dates mentioned

Output as a single paragraph under 200 words. Use past tense.
Do not infer, speculate, or add information not present in the transcript.

USER: Summarize the following consultation segment:
{TURNS}

Consolidation Schedule

For high-volume systems, run consolidation on a schedule rather than after every session. A daily consolidation job at 2AM local time works well for legal workflows. Use a cron trigger or Dakera's autopilot to schedule it automatically.

Build Memory That Thinks Like a Lawyer

Dakera handles decay, consolidation, and recall automatically. Ship your first summarization pipeline in 25 minutes.

Get Started Free →