Changelog

Recent Dakera releases. Full history at GitHub Releases and GHCR tags.

Version	Date	Highlights
`v0.11.83` LATEST	Jun 2026	Docker image variants — `:cpu` (INT8 quantised, CPU-only) and `:gpu`/`:cuda` (FP32 with CUDA base image). Deterministic HNSW build order — `DAKERA_HNSW_SEED` now effective across restarts. Raw-fs fast write path for `ObjectStorage::upsert` (~9× storage throughput). O(namespace) scan removed from per-batch list — eliminates per-store N² overhead.
`v0.11.82`	Jun 2026	Model2Vec static-write tier enabled in production (`DAKERA_TIERED=1`) — 9.7× ingest throughput. New `POST /admin/reembed/drain` endpoint for synchronous quality drain without waiting for background cycles. GLiNER entity extraction restored (FP32 model + correct span format). SHA-256 integrity guard for ONNX cold-boot with re-fetch-once on mismatch. Redundant legacy ONNX pool freed in tiered mode.
`v0.11.81`	Jun 2026	OOM hardening — `OnnxBackend` pool capped to 1 in GPU mode (eliminates BFCArena fragmentation from concurrent CUDA sessions), CPU memory pattern disabled (`.with_memory_pattern(false)`), OOM retry depth extended to batch=1. GPU forward passes serialised at allocator level via `parking_lot::Mutex`, replacing coarser `GPU_INFERENCE_SEMAPHORE`.
`v0.11.80`	May 2026	SIMD-accelerated HNSW distance (3–8× throughput on x86_64 + aarch64). GPU semaphore fix — `OnnxBackend` now serialises CUDA forward passes, eliminating `CUBLAS_STATUS_ALLOC_FAILED` under parallel ingest. CE-TORA8 confidence gate: Cat3 temporal recall 73.9% (+5.0 pp).
`v0.11.79`	May 2026	Global `GPU_INFERENCE_SEMAPHORE` (1 permit) serialises all CUDA forward passes across ONNX + Candle backends — prevents `BFCArena` VRAM fragmentation under concurrent ingest. Semaphore now acquired before `TieredEngine` branch in `embed_text()`.
`v0.11.78`	May 2026	`TieredEngine` pre-warms at server startup — eliminates 7–10 min first-request stall. `ReembedJob` batch ANN invalidation: one HNSW rebuild per cycle instead of O(N) (18× recall speedup). `store_memory_batch()` now routes through `TieredEngine` write path — GPU-free static ingest for batch operations.
`v0.11.77`	May 2026	`SearchMode::Hybrid` is now the server default. TieredEngine end-to-end fully wired — static→transformer upgrade pipeline. `docker-compose.yml` ships `DAKERA_TIERED=1` + `DAKERA_SEARCH_MODE=hybrid` out of the box.
`v0.11.76`	May 2026	Binary HNSW overselect formula fixed — Recall@10 restored from 54% → ~100% for `DAKERA_SEARCH_MODE=hybrid`. `SearchMode` fallback corrected (unknown values → Float).
`v0.11.75`	May 2026	Pluggable inference backends (`EmbeddingBackend` trait): ONNX, Candle, GGUF, Static. `StaticBackend` Model2Vec ~500× ingest. `TieredEngine`. Binary HNSW. `ModernBertEmbedBase` 768d. Batch write method in all 4 SDKs (py/js/rs/go).
`v0.11.74`	May 2026	NER redesign: lowercase entity tags, `entity_types` honoured by all extractors, entity dedup, byte-offset span detection, `MAX_TEXT_WORDS` guard for long inputs.
`v0.11.73`	May 2026	Dual-layer timestamping — event date extracted at store time into `_dakera_content_date` metadata for temporal proximity scoring. `SharesEntity` knowledge-graph edge expansion for entity-linked cross-encoder recall.
`v0.11.72`	May 2026	`temporal_rerank_multiplier` raised 8×→12×. GLiNER span detection fix for contractions and multi-token entity names.
`v0.11.71`	May 2026	FP32 model for GPU/CUDA inference (fixes accuracy on CUDA EP). O(1) session lookup replaces O(N) scan. ONNX default batch size 8→32 for CPU deployments.
`v0.11.69`	May 2026	Parallel HNSW+BM25 in Hybrid mode re-introduced with full 1540Q bench gate — Cat4 tie-breaking regression resolved. 87.1% LoCoMo, Cat3 73.9%, Cat4 88.8%.
`v0.11.68`	May 2026	Revert parallel HNSW+BM25 (Cat4 RNG tie-breaking regression under tokio task interleaving). GPU CUDA Execution Provider enabled.
`v0.11.67`	May 2026	Cross-encoder overload gate (`RERANKER_MAX_CONCURRENT=6`) prevents queue-saturation cascades — graceful degradation to unranked results when gate fires, no client-side timeout storm.
`v0.11.66`	May 2026	Batch ONNX cross-encoder — mini-batch size `RERANKER_ONNX_BATCH_SIZE=16` within each chunk; each mini-batch padded to its own max seq_len, reducing memory waste and improving throughput under concurrent rerank load.
`v0.11.65`	May 2026	Cross-encoder session pool (N=4) + adaptive chunk splitting — eliminates contention under concurrent recall. Each cross-encoder call gets a pooled session; chunks re-split for long documents exceeding model max seq_len.
`v0.11.64`	May 2026	Async metric recording off hot-path — pipeline stage metrics now recorded via `tokio::spawn`, removing synchronous atomic + alloc overhead from every recall. Fixes -0.7pp regression introduced in v0.11.63.
`v0.11.63`	May 2026	Full 8-stage recall pipeline instrumentation — Prometheus histograms for each stage (query parse → BM25 → vector → rerank → cross-encoder → dedupe → hydrate → response). Adds `/metrics` observability with no throughput cost.
`v0.11.62`	May 2026	GLiNER NER fully restored — `text_lengths` tensor reshaped rank-1→rank-2 so entity extraction works correctly after v0.11.61 deploy.
`v0.11.61`	May 2026	GLiNER model path fix — HuggingFace repos renamed (hyphen→underscore); updated to `onnx-community/gliner_medium-v2.1`. No HF token required. Restores `dakera_auto_tag` entity extraction.
`v0.11.60`	May 2026	HF_TOKEN support for HuggingFace model downloads — injects auth header on all download requests. Graceful fallback: no token → unauthenticated (existing behaviour for public models).
`v0.11.59`	May 2026	ONNX batch size 32→1 — eliminates LME 408 timeout caused by one 512-token text padding 31 peers to max seq_len. Each text now gets its own ONNX call. Session pool provides equivalent throughput.
`v0.11.58`	May 2026	ONNX session pool N=4 + parallel batch storage upsert — eliminates LME throughput bottleneck. Concurrent callers round-robin across pool slots. `buffer_unordered(16)` for parallel vector writes.
`v0.11.57`	May 2026	Docker base image rust:1.92→1.95 — `[email protected]` requires rustc ≥ 1.95. Fixes release build failures.
`v0.11.56`	May 2026	SDK engine parity: admin (cluster, maintenance, quotas, slow queries, backups), ops (diagnostics, jobs, compaction, shutdown), health probes, vector bulk ops (update, delete, count), fulltext stats/delete, TTL stats, storage tiers, memory type stats, agent consolidation, namespace entity config/extractor/dimension migration. All 4 SDKs (py/js/rs/go) at full parity.
`dakera-mcp v0.10.8` LATEST MCP	May 2026	Maintenance releases v0.10.4–v0.10.8 — stability improvements, updated tool descriptions. See GitHub Releases for full MCP changelog.
`dakera-mcp v0.10.2`	May 2026	Page size increased from 20 → 100: all profiles now return results in a single page. Reduces round-trips for agents with large memory stores.
`dakera-mcp v0.10.0`	May 2026	Profile-based tool tiering: 14 core tools by default, power/admin/all profiles for expanded access. Meta-discovery tools (`discover_tools`, `load_tools`). ~30K token savings vs loading all tools.
`v0.11.55`	May 2026	CE-118 temporal hybrid retrieval. Cat2 gate achieved (86.3%). 88.2% LoCoMo overall.
`v0.11.54`	May 2026	MCP: 14 core tools (86+ available via profiles). `DAKERA_ENTITY_VECTOR_SEARCH` enabled by default. Cross-encoder pipeline.
`v0.11.45`	Apr 2026	CE-71: ML routing classifier on by default. TEMPORAL_INFERENCE enabled. 87.1% LoCoMo.
`v0.11.27`	Apr 2026	HA cluster gossip stability improvements. MinIO throttle tuning.
`v0.10.2`	Apr 2026	Full stack release: server + py + js + rs + go SDKs simultaneously.

SDK versions — SDKs track the server minor version. Current: Python / JS / Go / Rust at v0.11.75, server at v0.11.83. Server patches v0.11.76–v0.11.83 are performance, GPU, ingest-tier, and default-config fixes with no API changes — existing SDK clients work without update. See the Introduction → for the full packages table.

Release cadence

Dakera releases frequently — typically multiple versions per week during active development. All releases follow semantic versioning. Patch versions are always safe to upgrade to in-place.

Release artifacts

GitHub Releases ↗ GHCR ↗ PyPI ↗ Helm (ArtifactHub) ↗