AI QMD QMD — Local Hybrid Search for OpenClaw Agent Memory What it is: QMD is a fully local, on-device search engine for markdown files built by Tobi Lütke (Shopify CEO). It replaces OpenClaw's broken built-in memory search with a three-stage hybrid pipeline: BM25 keyword matching + vector semantic search + LLM re-ranking. No API keys, no cloud calls, no network traffic after initial model download. Why it matters: OpenClaw's default memory_search uses pure vector embeddings that routinely return semantically adjacent but factually wrong results. QMD fixes this by running three search methods in parallel and fusing them with Reciprocal Rank Fusion, then re-ranking the top candidates with a local LLM. Community reports claim it cuts token usage by 95%+ compared to context-stuffing approaches. Install: npm install -g @tobilu/qmd (requires Node.js 22+ or Bun). Three GGUF models (~2.5GB total) auto-download from HuggingFace on first use. How It Works Every query goes through this pipeline: Query Expansion — A local 1.7B LLM generates two alternative formulations of your query Parallel Search — All three query variants (original weighted 2×) run through both BM25 full-text search AND vector cosine similarity search simultaneously Reciprocal Rank Fusion (RRF) — Results merge with k=60, original query weighted 2×, top-rank bonus (+0.05 for #1, +0.02 for #2-3) LLM Re-ranking — Top 30 candidates scored by a local reranker model (yes/no + logprob confidence) Position-Aware Blending — Final ranking blends RRF and reranker scores: ranks 1-3 use 75% RRF / 25% reranker, ranks 4-10 use 60/40, ranks 11+ use 40/60 Documents are chunked into ~900-token pieces with 15% overlap, using smart boundary detection that prefers markdown headings. Embeddings and LLM responses are cached in SQLite ( ~/.cache/qmd/index.sqlite ) with content-hash keying, so moving/renaming files doesn't require re-embedding. Local Models QMD runs three GGUF models locally via node-llama-cpp . All models auto-download to ~/.cache/qmd/models/ on first use: Purpose Model Size Source Embedding embeddinggemma-300M (Google) ~300MB HuggingFace Re-ranking Qwen3-Reranker-0.6B-Q8_0 ~600MB HuggingFace Query Expansion qmd-query-expansion-1.7B (Tobi's fine-tune) ~1.7GB HuggingFace On Apple Silicon, QMD auto-detects Metal GPU acceleration at startup. Total VRAM/memory footprint is ~2.5GB — negligible on a 128GB Mac Studio. Swapping the Embedding Model The embedding model can be overridden via environment variable: # Use Qwen3-Embedding for better multilingual (CJK) support export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf" # After changing model, re-embed all collections: qmd embed -f Note: This still loads a local GGUF file — it doesn't point at a remote API. Vectors are not cross-compatible between models, so you must re-index after switching. Can I Point Models to a Remote Machine? Short answer: No — QMD is designed to be fully local. QMD runs all inference through node-llama-cpp in-process. There is no built-in configuration to route embedding, reranking, or query expansion to a remote API endpoint. The project's tagline is literally "Tracking current sota approaches while being all local." There is a QMD_OPENAI_BASE_URL environment variable referenced in some third-party integration guides (specifically the ehc-io/qmd fork), but this is not part of Tobi's original QMD and applies to a different use case. Workarounds if you need remote inference: The models are tiny (~2.5GB total). On a 128GB Mac Studio they're negligible — just run them locally alongside your main LLM If you absolutely need to offload: QMD exposes an MCP server ( qmd mcp ) with HTTP transport mode. You could run QMD on a remote machine and connect to it as an MCP service. But the models themselves still run local to wherever QMD is installed Fork and modify — QMD is MIT licensed and the embedding/reranking code is in src/ . You could swap the node-llama-cpp calls for HTTP calls to a remote endpoint, but this is custom development OpenClaw Integration Three integration paths, from simplest to most flexible: 1. Built-in Memory Backend (Recommended) Set memory.backend = "qmd" in ~/.openclaw/openclaw.json : { "memory": { "backend": "qmd", "citations": "auto", "qmd": { "includeDefaultMemory": true, "command": "qmd", "searchMode": "search", "update": { "interval": "5m", "debounceMs": 15000, "onBoot": true, "waitForBootSync": false }, "limits": { "maxResults": 6, "timeoutMs": 4000 }, "scope": { "default": "deny", "rules": [ { "action": "allow", "match": { "chatType": "direct" } } ] } } } } OpenClaw automatically creates collections (e.g. memory-root for memory/**/*.md ) and indexes on boot. 2. MCP Server Run qmd mcp to expose QMD as an MCP tool. Supports stdio and HTTP transport. HTTP daemon mode keeps models warm in VRAM between queries, reducing latency from ~16s to ~10s. 3. openclaw-engram Plugin A community plugin ( github.com/joshuaswarren/openclaw-engram ) that uses QMD as the backend for a persistent long-term memory system with LLM-powered extraction. Search Modes Command Method Speed Best For qmd search BM25 keyword only ~50-200ms You know the exact terms qmd vsearch Vector similarity only ~500-1000ms Semantic matches without reranking qmd query Full hybrid pipeline ~3-10s Highest quality results (recommended) Key Features Query documents — Structured multi-line queries with typed lines ( lex: , vec: , hyde: ) combining keyword precision with semantic recall Intent parameter — Optional --intent flag disambiguates queries across the entire pipeline. "performance" means different things in different contexts Quoted phrases and negation — "C++ performance" -sports -athlete works in lex search Content-hash keying — Moving/renaming files doesn't require re-embedding LLM response caching — Query expansion and rerank scores cached in SQLite Agentic output formats — --json , --files , --md , --full for structured agent consumption Collection contexts — Hierarchical semantic metadata (e.g. "Personal notes and meeting logs") improves relevance Separate indexes — qmd --index work search "quarterly reports" keeps knowledge bases isolated Storage Index: ~/.cache/qmd/index.sqlite (SQLite with FTS5 + sqlite-vec) Models: ~/.cache/qmd/models/ (~2.5GB) Config: ~/.config/qmd/index.yml (collection definitions, respects XDG_CONFIG_HOME) Quick Reference # Install npm install -g @tobilu/qmd # Create a collection qmd collection add ~/.openclaw/workspace --name workspace --mask "**/*.md" # Generate embeddings qmd embed # Search qmd query "what did I decide about the camera setup" # Status qmd status # Re-index (with git pull for remote repos) qmd update --pull Version: v1.1.6 (current as of March 2026) License: MIT GitHub: github.com/tobi/qmd npm: @tobilu/qmd Runtime: Node.js 22+ or Bun Dependencies: node-llama-cpp, sqlite-vec, better-sqlite3 Summary OpenClaw Memory Enhancement Stack Four tools that address different layers of OpenClaw's memory problem. Each solves a different failure mode — they're complementary, not competing. Tool What It Does Runs On Status QMD Hybrid search (BM25 + vector + LLM rerank) over markdown files Linux agent box (CPU) or Mac (Metal) Stable — v1.1.6 memory-lancedb-pro Vector memory plugin with decay, hybrid retrieval, scope isolation Linux agent box, embeddings on Mac Stable — npm package OpenViking Context database with virtual filesystem and tiered token loading Linux agent box, VLM calls on Mac ⚠️ Hold — LiteLLM supply chain attack MetaClaw Continual learning proxy — agent gets smarter over time Linux agent box, forwards inference to Mac Beta — v0.4.0, very new QMD — Search That Actually Works Created by: Tobi Lütke (Shopify CEO) GitHub: github.com/tobi/qmd License: MIT Install: npm install -g @tobilu/qmd What it solves: OpenClaw's built-in memory_search uses pure vector embeddings that routinely return wrong results. QMD replaces it with a three-stage hybrid pipeline. How it works: A local 1.7B LLM expands your query into two alternative formulations All three variants run through BM25 keyword search AND vector cosine similarity in parallel Results merge via Reciprocal Rank Fusion (original query weighted 2×) Top 30 candidates scored by a local reranker model Final ranking blends RRF and reranker scores with position-aware weighting Local models (~2.5GB total, auto-downloaded): embeddinggemma-300M (embedding) Qwen3-Reranker-0.6B (reranking) qmd-query-expansion-1.7B (query expansion — Tobi's custom fine-tune) Remote inference: Not supported natively. Models run via node-llama-cpp in-process. On 128GB Mac Studio they're negligible. On the Linux agent box they run fine on CPU (~10-15s per query vs ~3-5s on Metal). OpenClaw integration: Set memory.backend = "qmd" in openclaw.json. OpenClaw creates collections and indexes automatically on boot. Key commands: qmd query "what did I decide about the camera setup" # hybrid search (best quality) qmd search "frigate RTSP" # keyword only (fastest) qmd vsearch "home automation preferences" # vector only qmd status # index info qmd update --pull # re-index (git pull first) qmd embed -f # force re-embed all memory-lancedb-pro — Long-Term Memory With Decay Created by: CortexReach (community) GitHub: github.com/CortexReach/memory-lancedb-pro License: MIT Install: npm i memory-lancedb-pro What it solves: OpenClaw's default memory has no decay (everything stays equally weighted forever), no hybrid search (vector only), no deduplication, and MEMORY.md dumps its entire contents into every session wasting tokens. How it works: Auto-capture: Extracts preferences, facts, decisions, and entities from conversations automatically (up to 3 per turn) Auto-recall: Injects relevant memories into context before agent responds (up to 3 entries) Smart extraction: LLM-powered 6-category classification (Profile, Preferences, Entities, Events, Cases, Patterns) with L0/L1/L2 metadata Hybrid retrieval: Vector search + BM25 keyword search fused with RRF, then cross-encoder reranking Weibull time decay: Memories that aren't accessed gradually fade from active retrieval Three-tier system: Core → Working → Peripheral with automatic promotion/demotion Multi-scope isolation: global, agent:, project:, user:, custom: Remote inference: Yes — embedding uses any OpenAI-compatible /v1/embeddings endpoint. Point baseUrl at Ollama/LM Studio on the Mac. Reranking uses Jina, SiliconFlow, or any compatible reranker API. Key config (openclaw.json): { "plugins": { "slots": { "memory": "memory-lancedb-pro" }, "entries": { "memory-lancedb-pro": { "enabled": true, "config": { "embedding": { "model": "nomic-embed-text", "baseUrl": "http://:11434/v1", "apiKey": "not-needed" }, "autoCapture": true, "autoRecall": true, "smartExtraction": { "enabled": true }, "retrieval": { "mode": "hybrid", "vectorWeight": 0.7, "bm25Weight": 0.3 } } } } } } Key commands: openclaw memory-pro list --scope global --limit 20 # list memories openclaw memory-pro search "query" --scope global # search memories openclaw memory-pro stats --scope global # count, categories, age openclaw memory-pro export --output memories.json # backup openclaw memory-pro import memories.json --dry-run # import (test first) openclaw memory-pro reembed # after changing embedding model openclaw memory-pro delete # delete specific memory openclaw memory-pro delete-bulk --before "2026-01-01" # bulk delete openclaw memory-pro migrate check --source /path/to/old # migrate from built-in plugin After config changes: openclaw config validate then openclaw gateway restart . After editing plugin .ts files: rm -rf /tmp/jiti/ then restart (jiti caches stale compiled code). OpenViking — Context Database With Virtual Filesystem Created by: ByteDance / Volcengine Viking Team GitHub: github.com/volcengine/OpenViking License: Apache 2.0 Stars: ~17,900 ⚠️ DO NOT INSTALL RIGHT NOW. OpenViking has a dependency on litellm>=1.0.0 . LiteLLM was hit by a supply chain attack on March 24, 2026 (TeamPCP backdoored versions 1.82.7-1.82.8). The entire litellm package is currently quarantined on PyPI — fresh installs of anything depending on it will fail. Wait for the quarantine to lift, then install with "provider": "openai" pointing at your local Ollama endpoint to bypass LiteLLM entirely. What it solves: Replaces flat vector storage with a hierarchical virtual filesystem. Instead of dumping everything into embeddings and hoping retrieval works, OpenViking organises context into structured directories accessible via viking:// URIs. How it works: Virtual filesystem: Three root directories — viking://resources/ (documents, repos), viking://user/ (preferences, habits), viking://agent/ (skills, task memories) Unix-like navigation: ls , find , read , tree , grep against agent context L0/L1/L2 tiered loading: L0 = ~100 tokens (abstract), L1 = ~2,000 tokens (overview), L2 = full content on demand. Claims 91-95% token cost reduction vs traditional RAG Automatic memory self-iteration: At session end, extracts 6 memory categories and updates the appropriate directories Remote inference: Yes — configure "provider": "openai" with "api_base" pointing at the Mac's Ollama. This bypasses the LiteLLM dependency entirely for your use case. Benchmark results with OpenClaw: Task completion: 52.08% (vs 35.65% native OpenClaw — 43% improvement) Input tokens: 4.3M (vs 24.6M native — 91% reduction) When to install: After LiteLLM quarantine lifts. Clone repo, strip litellm>=1.0.0 from pyproject.toml if needed, use openai provider only. MetaClaw — The Agent That Learns From Its Mistakes Created by: AIMING Lab, UNC Chapel Hill GitHub: github.com/aiming-lab/MetaClaw License: Apache 2.0 Paper: arXiv 2603.17187 (ranked #1 on HuggingFace Daily Papers, March 18 2026) Stars: ~2,700 What it solves: The other three tools store and retrieve memories. MetaClaw is the only tool that actually makes the agent smarter over time . It learns from failure patterns and generates corrective behavioural skills. How it works: MetaClaw sits as an OpenAI-compatible proxy between OpenClaw and your LLM. It intercepts every interaction. Skill-driven fast adaptation (gradient-free): Analyses failure trajectories via an LLM evolver and synthesises new behavioural skills that take effect immediately. No GPU needed, no downtime. If your agent repeatedly makes the same mistake, MetaClaw generates a corrective skill. Opportunistic policy optimisation (gradient-based, optional): Cloud LoRA fine-tuning via RL, triggered only during user-inactive windows. An Opportunistic Meta-Learning Scheduler monitors sleep hours, keyboard inactivity, and calendar to find training windows. Can be skipped entirely. Results: Skill-driven adaptation improved accuracy by up to 32% relative. Full pipeline advanced Kimi-K2.5 from 21.4% to 40.6% accuracy (approaching GPT-5.2's 41.1%). Remote inference: Yes — MetaClaw IS a proxy. Run it on the Linux box, point its upstream at the Mac's LM Studio/Ollama endpoint. All inference happens on the Mac. MetaClaw just intercepts and learns. For your setup: Use skills-only mode (gradient-free). No GPU needed on the Linux box. Since v0.3.3, MetaClaw has a native OpenClaw plugin. Since v0.4.0 (March 25), it includes a "Contexture layer" for cross-session memory. Caveat: Very new (16 days old, 7 releases). Academically rigorous but least battle-tested of the four tools. Worth running but expect rough edges. Install Order QMD — npm install -g @tobilu/qmd , set memory.backend = "qmd" . Immediate improvement to search/recall. memory-lancedb-pro — npm i memory-lancedb-pro , configure embedding endpoint to Mac. Fixes memory bloat and adds decay. MetaClaw — Install as OpenClaw plugin, configure as proxy. Adds learning over time. OpenViking — Wait for LiteLLM situation to resolve. Then install with openai provider pointing at Mac. Test each layer before adding the next. If something breaks, you know which component caused it. Architecture ┌─────────────────────────┐ ┌─────────────────────────┐ │ Linux Machine │ │ Mac Studio 128GB │ │ │ │ │ │ OpenClaw Gateway │──HTTP──▶│ LM Studio / Ollama │ │ QMD (search index) │ │ - Primary LLM │ │ memory-lancedb-pro │──HTTP──▶│ - Embedding model │ │ MetaClaw (proxy) │──HTTP──▶│ (nomic-embed-text) │ │ OpenViking (future) │──HTTP──▶│ │ │ Workspace files │ │ Headscale connected │ │ Headscale connected │ │ │ └─────────────────────────┘ └─────────────────────────┘ All heavy inference on the Mac. All files, indexing, gateway, and agent logic on the Linux box. OpenViking OpenViking — Context Database for AI Agents What it is: OpenViking is an open-source context database from ByteDance's Volcengine team that replaces flat vector storage with a hierarchical virtual filesystem. Instead of dumping everything into embeddings and hoping retrieval works, it organises all agent context into structured directories accessible via viking:// URIs — like a Unix filesystem for your agent's brain. Why it matters: In benchmarks with OpenClaw, OpenViking improved task completion from 35.65% to 52.08% (43% improvement) while cutting input tokens from 24.6M to 4.3M (91% reduction). The tiered loading system means your agent only pulls in what it needs, when it needs it. GitHub: github.com/volcengine/OpenViking Stars: ~17,900 License: Apache 2.0 Language: Python (core) + Rust (CLI) + C++ (vector extensions) + Go (AGFS server) Requires: Python 3.10+ How It Works Virtual Filesystem All agent context lives in three root directories: viking://resources/ — documents, repos, web pages, project files viking://user/ — preferences, habits, personal context viking://agent/ — skills, task memories, instructions You interact with context using Unix-like commands: ls , find , read , tree , grep . The agent navigates its own memory the same way you'd navigate a filesystem. L0/L1/L2 Tiered Loading This is the key innovation that solves token bloat. Instead of loading full documents into context, OpenViking processes everything into three tiers: L0 (Abstract): ~100 tokens — one-sentence summary for quick identification L1 (Overview): ~2,000 tokens — structured overview with key details L2 (Full Content): Complete document, loaded on demand only The agent starts with L0 summaries, drills into L1 when something looks relevant, and only loads L2 when it genuinely needs the full content. Claims 91-95% token cost reduction vs traditional RAG approaches. Automatic Memory Self-Iteration At session end, OpenViking automatically extracts six categories of memory and updates the appropriate directories: Profile — factual information about the user Preferences — how the user likes things done Entities — people, projects, tools referenced Events — decisions made, things that happened Cases — problems solved, approaches taken Patterns — recurring behaviours and workflows This is significantly more structured than OpenClaw's default "hope the LLM remembers to write to MEMORY.md" approach. Architecture For Our Setup OpenViking runs on the Linux agent box. All LLM inference (VLM and embedding) routes to the Mac Studio over Headscale. ┌─────────────────────────┐ ┌─────────────────────────┐ │ Linux Machine │ │ Mac Studio 128GB │ │ │ │ │ │ OpenViking Server │──HTTP──▶│ Ollama │ │ (port 1933) │ │ - VLM model │ │ │ │ - Embedding model │ │ OpenClaw Gateway │ │ (nomic-embed-text) │ │ (port 18789) │ │ │ │ + openclaw-openviking │ │ │ │ plugin │ │ │ └─────────────────────────┘ └─────────────────────────┘ VLM Providers OpenViking supports three VLM providers. For our local setup, use openai to bypass the LiteLLM dependency entirely: Provider Description Use For Our Setup? volcengine ByteDance's Doubao models (cloud) No — cloud only openai Any OpenAI-compatible API endpoint Yes — point at Ollama on Mac litellm Multi-provider routing via LiteLLM No — compromised, avoid Installation Prerequisites # Install uv (Python package manager used by OpenViking) curl -LsSf https://astral.sh/uv/install.sh | sh # Install the Rust CLI curl -fsSL https://raw.githubusercontent.com/volcengine/OpenViking/main/crates/ov_cli/install.sh | bash Install OpenViking (with LiteLLM workaround) # Clone the repo git clone https://github.com/volcengine/OpenViking.git cd OpenViking # Create virtual environment uv venv --python 3.11 source .venv/bin/activate # IMPORTANT: Edit pyproject.toml to remove or comment out litellm dependency # Find the line with litellm>=1.0.0 and remove it nano pyproject.toml # Install uv pip install -e . Install VikingBot (agent framework, optional) uv pip install -e ".[bot]" Install OpenClaw Plugin # The official OpenClaw integration plugin openclaw plugins install openclaw-openviking-plugin Configuration Create the config file at ~/.openviking/ov.conf : { "storage": { "workspace": "/home/conor/openviking_workspace" }, "log": { "level": "INFO", "output": "stdout" }, "embedding": { "dense": { "api_base": "http://100.64.0.9:11434/v1", "api_key": "not-needed", "provider": "openai", "dimension": 768, "model": "nomic-embed-text" }, "max_concurrent": 10 }, "vlm": { "api_base": "http://100.64.0.9:11434/v1", "api_key": "not-needed", "provider": "openai", "model": "qwen3-coder-next", "max_concurrent": 100 } } Notes: 100.64.0.9 is the Mac Studio's Headscale IP provider: "openai" works with any OpenAI-compatible endpoint including Ollama Embedding dimension must match the model — 768 for nomic-embed-text, 1024 for mxbai-embed-large The VLM model handles the intelligent context processing (summarisation, extraction, classification) Running # Start the OpenViking server openviking-server # Default address: 127.0.0.1:1933 # Start with VikingBot enabled (optional) openviking-server --with-bot Key Commands (ov CLI) Adding Resources # Add a GitHub repo ov add-resource https://github.com/volcengine/OpenViking # Add a local directory ov add-resource /home/conor/.openclaw/workspace # Wait for processing to complete (otherwise runs async) ov add-resource /path/to/docs --wait Browsing Context # List root directories ov ls viking://resources/ # Tree view (2 levels deep) ov tree viking://resources/my-project -L 2 # Read a specific file at L1 (overview) ov read viking://resources/my-project/README.md # Check server status ov status Searching # Semantic search across all context ov find "what is the camera VLAN setup" # Grep for exact terms within a scope ov grep "RTSP" --uri viking://resources/home-automation Interactive Chat (with VikingBot) # Start interactive chat session ov chat OpenClaw Integration Once the openclaw-openviking-plugin is installed, OpenViking becomes available as a context source for your agent. The plugin connects to the OpenViking server running on the same machine and provides the agent with access to the virtual filesystem commands. The OpenViking server must be running before the OpenClaw gateway starts. Consider adding it as a systemd service: # Example systemd service file # /etc/systemd/user/openviking.service [Unit] Description=OpenViking Context Database Before=openclaw.service [Service] ExecStart=/home/conor/OpenViking/.venv/bin/openviking-server WorkingDirectory=/home/conor/OpenViking Restart=always RestartSec=5 [Install] WantedBy=default.target # Enable and start systemctl --user enable openviking systemctl --user start openviking How OpenViking Differs From The Other Memory Tools Feature OpenClaw Default memory-lancedb-pro QMD OpenViking Storage model Flat markdown files Vector DB + BM25 SQLite + vector index Virtual filesystem Token efficiency Loads everything Top-N retrieval Top-N retrieval L0/L1/L2 tiered loading Context organisation None Scopes (global/agent/project) Collections Hierarchical directories Auto-categorisation No 6 categories No 6 categories + directory structure Agent can browse Read specific files Search only Search only ls, tree, find, grep, read Multi-resource support Workspace only Conversation memories Markdown files Git repos, URLs, docs, local dirs OpenViking and memory-lancedb-pro are complementary, not competing. memory-lancedb-pro handles conversation-level memory (what you said, preferences extracted from chat). OpenViking handles resource-level context (your codebase, documentation, project files, knowledge bases). QMD provides the search layer across markdown files. Together they cover different retrieval needs. Breaking Changes Warning From the release notes: after upgrading OpenViking, datasets/indexes generated by previous versions are not compatible. You must rebuild: # Stop the server systemctl --user stop openviking # Clear workspace (this deletes all indexed data — resources will need re-adding) rm -rf /home/conor/openviking_workspace # Restart systemctl --user start openviking # Re-add your resources ov add-resource /path/to/your/stuff --wait Key Files and Paths Config: ~/.openviking/ov.conf (override with OPENVIKING_CONFIG_FILE env var) Workspace: Configured in ov.conf storage.workspace — stores all indexed data CLI binary: ov (Rust, installed via install script) Server: openviking-server (Python, from the pip install) Default port: 1933 Version: v0.2.1 (current as of March 2026) Status: Alpha — performance and consistency not fully optimised GitHub: github.com/volcengine/OpenViking Docs: openviking.ai LanceDB Pro memory-lancedb-pro Enhanced LanceDB memory plugin for OpenClaw — community reference guide Overview memory-lancedb-pro is a community-developed, production-grade long-term memory plugin for OpenClaw. It replaces the built-in memory-lancedb plugin with a significantly more capable retrieval pipeline, designed for agents that need persistent, high-quality memory across sessions without manual tagging or configuration overhead. The core problem it solves: standard OpenClaw agents have no memory between sessions. Every conversation starts from zero. memory-lancedb-pro automatically captures what matters from each session and retrieves relevant context in future ones. Primary upstream repo: CortexReach/memory-lancedb-pro . Several community forks exist (win4r, McBorisson, fryeggs, kvc0769) with varying additions such as Volcengine multimodal embeddings or unified Claude Code/Claude Desktop support. OpenClaw 2026.3+ compatibility: The CortexReach fork has been updated to use before_prompt_build hooks, replacing the deprecated before_agent_start hook. If you are on 2026.3.24 or later, use this fork. Run openclaw doctor --fix after upgrading. Feature Comparison Feature Built-in memory-lancedb memory-lancedb-pro Vector search ✓ ✓ BM25 full-text search ✗ ✓ Hybrid fusion (Vector + BM25) ✗ ✓ configurable weights Cross-encoder reranking ✗ ✓ Jina, SiliconFlow, Pinecone, etc. Recency / time decay scoring ✗ ✓ MMR diversity filtering ✗ ✓ Multi-scope isolation ✗ ✓ global / agent / project / user Smart LLM extraction ✗ ✓ optional, uses any OpenAI-compatible LLM Management CLI ✗ ✓ list / search / stats / delete / export / import Auto-capture on session end ✓ basic ✓ with deduplication, up to 3 per turn Auto-recall before prompt ✓ basic ✓ adaptive — skips trivial/short queries Noise filtering ✗ ✓ Migration tool from built-in plugin — ✓ Retrieval Pipeline Queries pass through a multi-stage pipeline before results are injected into the agent prompt: Embed query — using the configured OpenAI-compatible embedding provider Parallel search — vector ANN search (cosine distance) + BM25 full-text search run simultaneously Hybrid fusion — vector score used as base; BM25 hits receive a configurable weighted boost Rerank — optional cross-encoder reranking via external API (60% cross-encoder score + 40% fused score) Lifecycle decay scoring — recency boost, time decay, importance weight, length normalisation Filter — hard minimum score, noise filter, MMR diversity deduplication Inject — surviving memories injected as context block If the reranker API fails, the pipeline degrades gracefully to cosine similarity reranking. Installation 1. Clone into your OpenClaw workspace cd ~/.openclaw/workspace git clone https://github.com/CortexReach/memory-lancedb-pro.git plugins/memory-lancedb-pro cd plugins/memory-lancedb-pro npm install Common mistake: Cloning the repo somewhere other than your workspace and then using a relative path in plugins.load.paths . Relative paths are resolved from the workspace root. Use an absolute path if cloning elsewhere. 2. Disable the built-in memory plugin Only one memory plugin can be active at a time. If you previously used memory-lancedb , disable it before enabling this plugin. 3. Add to openclaw.json { "plugins": { "load": { "paths": ["plugins/memory-lancedb-pro"] }, "entries": { "memory-lancedb-pro": { "enabled": true, "config": { "embedding": { "apiKey": "${JINA_API_KEY}", "model": "jina-embeddings-v5-text-small", "baseURL": "https://api.jina.ai/v1", "dimensions": 1024, "taskQuery": "retrieval.query", "taskPassage": "retrieval.passage", "normalized": true } } } }, "slots": { "memory": "memory-lancedb-pro" } } } Config changes require a gateway restart. With config watch enabled (default), this happens automatically. Key Configuration Options Option Default Notes autoCapture true Capture memories at session end autoRecall true Inject memories before prompt build smartExtraction true Use LLM to classify memories instead of regex extractMinMessages 3 Minimum messages before extraction runs captureAssistant true Set false to only capture user messages retrieval.mode hybrid vector , bm25 , or hybrid retrieval.vectorWeight 0.7 Weight for vector scores in hybrid fusion retrieval.bm25Weight 0.3 Weight for BM25 scores in hybrid fusion rerank.enabled false Enable cross-encoder reranking rerank.candidatePoolSize 12 Candidates passed to reranker rerank.minScore 0.6 Soft minimum score post-rerank rerank.hardMinScore 0.62 Hard cutoff — below this is always dropped sessionMemory.enabled true Store session summaries on /new autoRecall.minPromptLength 15 (EN) / 6 (CJK) Skip recall for very short queries Management CLI The plugin ships with a CLI for direct memory management: openclaw memory-pro list # list stored memories openclaw memory-pro search # semantic/keyword search openclaw memory-pro stats # storage stats openclaw memory-pro delete # delete a specific memory openclaw memory-pro export # export all memories openclaw memory-pro import # import memories Agent Tool Definitions When loaded, the plugin registers these tools for the agent to use directly: memory_recall — retrieve relevant memories for a query memory_store — explicitly store a memory memory_forget — delete a memory by ID or query memory_update — update an existing memory Plus additional management tools exposed via the CLI commands above. Multi-Scope Isolation Memories can be scoped to control access between agents and users: global — shared across all agents agent: — isolated to a specific agent project: — shared within a project user: — per-user isolation (useful for multi-user bots) custom: — arbitrary named scope Telegram Setup If running OpenClaw with Telegram, the easiest way to configure the plugin is via the bot directly. Send the following to your main bot: Help me connect this memory plugin with the most user-friendly configuration: https://github.com/CortexReach/memory-lancedb-pro Requirements: 1. Set it as the only active memory plugin 2. Use Jina for embedding and reranker 3. Use gpt-4o-mini for the smart-extraction LLM ... (continue with your preferences) Important Notes jiti cache: After modifying any .ts file in the plugin, you must clear the jiti cache before restarting the gateway, or OpenClaw will load stale compiled code: rm -rf /tmp/jiti/ && openclaw gateway restart Memory quality guidelines: Never store raw conversation summaries, large blobs, or duplicates. Prefer structured, atomic facts with keywords. On any tool failure or repeated error, call memory_recall with relevant keywords before retrying — the fix may already be stored. Spaced repetition: Frequently recalled memories decay more slowly, similar to spaced-repetition learning systems. Notable Community Forks Fork Notable additions CortexReach/memory-lancedb-pro Primary upstream. Updated for OpenClaw 2026.3+ hook architecture. win4r/memory-lancedb-pro Widely referenced in docs; standard feature set. fryeggs/memory-lancedb-pro Unified edition — extends to Claude Code, Codex CLI, and Claude Desktop via shared LanceDB backend. kvc0769/memory-lancedb-pro Adds Volcengine multimodal embedding support. McBorisson/memory-lancedb-pro Uses RRF fusion (vs. weighted boost in other forks); includes JSONL distillation pipeline. Generated March 2026. Sources: CortexReach/memory-lancedb-pro, openclaw/openclaw docs, LanceDB blog. design languange ● Design language summary   Vibe. Dark, minimal, slightly luminous. Information-dense but airy. Color is reserved — accent appears on actions and active states; semantic colors only for status. Most surfaces are low-contrast neutral; the eye is drawn by the one   accent.   Palette. Near-black background #0b0d12, two elevation tiers above it (#14171d panel, #1a1e26 panel-2). Borders are translucent white (7% default, 12% strong). Text is warm-white #e6eef6, muted #8a93a4, super-muted #5e6675. Accent is a   blue-violet #4f8cff with #6ea0ff for hover and a 15% soft variant for active backgrounds. A violet #8b5cf6 shows up only inside accent gradients (logos). Status: green #34d399, amber #fbbf24, red #f87171.   Type. Inter, system fallback. Base 14.5px / 1.5. Headings get tight tracking (-0.01em to -0.02em). Small labels are uppercase, ~0.78rem, 0.06–0.08em letter-spacing, in the muted tone. Numbers always use tabular figures. Antialiased.   Spacing & shape. Spacing scale of 4/8/12/16/24/32. Radii: 12px panels, 8px controls/inputs, 6px chips, 999px for bars. Two shadow tiers — subtle 0 2px 8px /35% for cards, lifted 0 8px 24px /45% for floating things. Active accents may   add a tinted glow (rgba(79,140,255,0.35)).   Layout. Sticky 240px sidebar + fluid main. On mobile the sidebar collapses to a horizontal top nav. Each page opens with a page-header (title + small meta on the right, divider underneath). Content lives inside card-panel blocks — never    floating directly on the bg.   Components.   - Stat card: tiny uppercase label with icon → big tabular number → muted sub-line. Border lightens on hover.   - Sidebar nav-item: icon + label, transparent base, accent-soft background when active. Destructive items shift to red on hover.   - Segmented control: pill group inside a panel-2 shell; active pill is solid accent with glow.   - Table: transparent rows, uppercase muted column headers, tabular nums right-aligned, 2% white row-hover.   - Bar (disk): 6px gradient fill, threshold-based color shift at 75% / 90%.   - Chip/folder pill: panel-2 with border, small inline icon, 6px radius.   Iconography. Bootstrap Icons (bi-*), inline with labels at slightly reduced opacity.   Motion. 0.15s ease for hover/active transitions; nothing splashy. Async sections fade to 50% opacity (is-loading); a single accent spinner for spinners.   Charts. Plotly with transparent bg, Inter font, 5% white gridlines, legend below the plot, custom dark hover label. Series colors come from a fixed 10-stop palette (blue → violet → green → amber → coral …) — never raw Plotly defaults.   Don'ts. No hard borders. No heavy drop shadows. No mixing accent colors for non-action UI. No raw Bootstrap-default buttons or tables. No solid color blocks for headers — keep them transparent over the panel.