Changelog

Recent Dakera releases. Full history at GitHub Releases and GHCR tags.

VersionDateHighlights
v0.11.83 LATEST Jun 2026 Docker image variants — :cpu (INT8 quantised, CPU-only) and :gpu/:cuda (FP32 with CUDA base image). Deterministic HNSW build order — DAKERA_HNSW_SEED now effective across restarts. Raw-fs fast write path for ObjectStorage::upsert (~9× storage throughput). O(namespace) scan removed from per-batch list — eliminates per-store N² overhead.
v0.11.82 Jun 2026 Model2Vec static-write tier enabled in production (DAKERA_TIERED=1) — 9.7× ingest throughput. New POST /admin/reembed/drain endpoint for synchronous quality drain without waiting for background cycles. GLiNER entity extraction restored (FP32 model + correct span format). SHA-256 integrity guard for ONNX cold-boot with re-fetch-once on mismatch. Redundant legacy ONNX pool freed in tiered mode.
v0.11.81 Jun 2026 OOM hardening — OnnxBackend pool capped to 1 in GPU mode (eliminates BFCArena fragmentation from concurrent CUDA sessions), CPU memory pattern disabled (.with_memory_pattern(false)), OOM retry depth extended to batch=1. GPU forward passes serialised at allocator level via parking_lot::Mutex, replacing coarser GPU_INFERENCE_SEMAPHORE.
v0.11.80 May 2026 SIMD-accelerated HNSW distance (3–8× throughput on x86_64 + aarch64). GPU semaphore fix — OnnxBackend now serialises CUDA forward passes, eliminating CUBLAS_STATUS_ALLOC_FAILED under parallel ingest. CE-TORA8 confidence gate: Cat3 temporal recall 73.9% (+5.0 pp).
v0.11.79 May 2026 Global GPU_INFERENCE_SEMAPHORE (1 permit) serialises all CUDA forward passes across ONNX + Candle backends — prevents BFCArena VRAM fragmentation under concurrent ingest. Semaphore now acquired before TieredEngine branch in embed_text().
v0.11.78 May 2026 TieredEngine pre-warms at server startup — eliminates 7–10 min first-request stall. ReembedJob batch ANN invalidation: one HNSW rebuild per cycle instead of O(N) (18× recall speedup). store_memory_batch() now routes through TieredEngine write path — GPU-free static ingest for batch operations.
v0.11.77 May 2026 SearchMode::Hybrid is now the server default. TieredEngine end-to-end fully wired — static→transformer upgrade pipeline. docker-compose.yml ships DAKERA_TIERED=1 + DAKERA_SEARCH_MODE=hybrid out of the box.
v0.11.76 May 2026 Binary HNSW overselect formula fixed — Recall@10 restored from 54% → ~100% for DAKERA_SEARCH_MODE=hybrid. SearchMode fallback corrected (unknown values → Float).
v0.11.75 May 2026 Pluggable inference backends (EmbeddingBackend trait): ONNX, Candle, GGUF, Static. StaticBackend Model2Vec ~500× ingest. TieredEngine. Binary HNSW. ModernBertEmbedBase 768d. Batch write method in all 4 SDKs (py/js/rs/go).
v0.11.74 May 2026 NER redesign: lowercase entity tags, entity_types honoured by all extractors, entity dedup, byte-offset span detection, MAX_TEXT_WORDS guard for long inputs.
v0.11.73 May 2026 Dual-layer timestamping — event date extracted at store time into _dakera_content_date metadata for temporal proximity scoring. SharesEntity knowledge-graph edge expansion for entity-linked cross-encoder recall.
v0.11.72 May 2026 temporal_rerank_multiplier raised 8×→12×. GLiNER span detection fix for contractions and multi-token entity names.
v0.11.71 May 2026 FP32 model for GPU/CUDA inference (fixes accuracy on CUDA EP). O(1) session lookup replaces O(N) scan. ONNX default batch size 8→32 for CPU deployments.
v0.11.69 May 2026 Parallel HNSW+BM25 in Hybrid mode re-introduced with full 1540Q bench gate — Cat4 tie-breaking regression resolved. 87.1% LoCoMo, Cat3 73.9%, Cat4 88.8%.
v0.11.68 May 2026 Revert parallel HNSW+BM25 (Cat4 RNG tie-breaking regression under tokio task interleaving). GPU CUDA Execution Provider enabled.
v0.11.67 May 2026 Cross-encoder overload gate (RERANKER_MAX_CONCURRENT=6) prevents queue-saturation cascades — graceful degradation to unranked results when gate fires, no client-side timeout storm.
v0.11.66 May 2026 Batch ONNX cross-encoder — mini-batch size RERANKER_ONNX_BATCH_SIZE=16 within each chunk; each mini-batch padded to its own max seq_len, reducing memory waste and improving throughput under concurrent rerank load.
v0.11.65 May 2026 Cross-encoder session pool (N=4) + adaptive chunk splitting — eliminates contention under concurrent recall. Each cross-encoder call gets a pooled session; chunks re-split for long documents exceeding model max seq_len.
v0.11.64 May 2026 Async metric recording off hot-path — pipeline stage metrics now recorded via tokio::spawn, removing synchronous atomic + alloc overhead from every recall. Fixes -0.7pp regression introduced in v0.11.63.
v0.11.63 May 2026 Full 8-stage recall pipeline instrumentation — Prometheus histograms for each stage (query parse → BM25 → vector → rerank → cross-encoder → dedupe → hydrate → response). Adds /metrics observability with no throughput cost.
v0.11.62 May 2026 GLiNER NER fully restored — text_lengths tensor reshaped rank-1→rank-2 so entity extraction works correctly after v0.11.61 deploy.
v0.11.61 May 2026 GLiNER model path fix — HuggingFace repos renamed (hyphen→underscore); updated to onnx-community/gliner_medium-v2.1. No HF token required. Restores dakera_auto_tag entity extraction.
v0.11.60 May 2026 HF_TOKEN support for HuggingFace model downloads — injects auth header on all download requests. Graceful fallback: no token → unauthenticated (existing behaviour for public models).
v0.11.59 May 2026 ONNX batch size 32→1 — eliminates LME 408 timeout caused by one 512-token text padding 31 peers to max seq_len. Each text now gets its own ONNX call. Session pool provides equivalent throughput.
v0.11.58 May 2026 ONNX session pool N=4 + parallel batch storage upsert — eliminates LME throughput bottleneck. Concurrent callers round-robin across pool slots. buffer_unordered(16) for parallel vector writes.
v0.11.57 May 2026 Docker base image rust:1.92→1.95 — [email protected] requires rustc ≥ 1.95. Fixes release build failures.
v0.11.56 May 2026 SDK engine parity: admin (cluster, maintenance, quotas, slow queries, backups), ops (diagnostics, jobs, compaction, shutdown), health probes, vector bulk ops (update, delete, count), fulltext stats/delete, TTL stats, storage tiers, memory type stats, agent consolidation, namespace entity config/extractor/dimension migration. All 4 SDKs (py/js/rs/go) at full parity.
dakera-mcp v0.10.8 LATEST MCP May 2026 Maintenance releases v0.10.4–v0.10.8 — stability improvements, updated tool descriptions. See GitHub Releases for full MCP changelog.
dakera-mcp v0.10.2 May 2026 Page size increased from 20 → 100: all profiles now return results in a single page. Reduces round-trips for agents with large memory stores.
dakera-mcp v0.10.0 May 2026 Profile-based tool tiering: 14 core tools by default, power/admin/all profiles for expanded access. Meta-discovery tools (discover_tools, load_tools). ~30K token savings vs loading all tools.
v0.11.55 May 2026 CE-118 temporal hybrid retrieval. Cat2 gate achieved (86.3%). 88.2% LoCoMo overall.
v0.11.54 May 2026 MCP: 14 core tools (86+ available via profiles). DAKERA_ENTITY_VECTOR_SEARCH enabled by default. Cross-encoder pipeline.
v0.11.45 Apr 2026 CE-71: ML routing classifier on by default. TEMPORAL_INFERENCE enabled. 87.1% LoCoMo.
v0.11.27 Apr 2026 HA cluster gossip stability improvements. MinIO throttle tuning.
v0.10.2 Apr 2026 Full stack release: server + py + js + rs + go SDKs simultaneously.
SDK versions — SDKs track the server minor version. Current: Python / JS / Go / Rust at v0.11.75, server at v0.11.83. Server patches v0.11.76–v0.11.83 are performance, GPU, ingest-tier, and default-config fixes with no API changes — existing SDK clients work without update. See the Introduction → for the full packages table.

Release cadence

Dakera releases frequently — typically multiple versions per week during active development. All releases follow semantic versioning. Patch versions are always safe to upgrade to in-place.

Release artifacts

GitHub Releases ↗ GHCR ↗ PyPI ↗ Helm (ArtifactHub) ↗