AI

QMD
QMD — Local Hybrid Search for OpenClaw Agent Memory 

 What it is: QMD is a fully local, on-device search engine for markdown files built by Tobi Lütke (Shopify CEO). It replaces OpenClaw's broken built-in memory search with a three-stage hybrid pipeline: BM25 keyword matching + vector semantic search + LLM re-ranking. No API keys, no cloud calls, no network traffic after initial model download. 

 Why it matters: OpenClaw's default memory_search uses pure vector embeddings that routinely return semantically adjacent but factually wrong results. QMD fixes this by running three search methods in parallel and fusing them with Reciprocal Rank Fusion, then re-ranking the top candidates with a local LLM. Community reports claim it cuts token usage by 95%+ compared to context-stuffing approaches. 

 Install: npm install -g @tobilu/qmd (requires Node.js 22+ or Bun). Three GGUF models (~2.5GB total) auto-download from HuggingFace on first use. 

 

 How It Works 

 Every query goes through this pipeline: 

 

 Query Expansion — A local 1.7B LLM generates two alternative formulations of your query 

 Parallel Search — All three query variants (original weighted 2×) run through both BM25 full-text search AND vector cosine similarity search simultaneously 

 Reciprocal Rank Fusion (RRF) — Results merge with k=60, original query weighted 2×, top-rank bonus (+0.05 for #1, +0.02 for #2-3) 

 LLM Re-ranking — Top 30 candidates scored by a local reranker model (yes/no + logprob confidence) 

 Position-Aware Blending — Final ranking blends RRF and reranker scores: ranks 1-3 use 75% RRF / 25% reranker, ranks 4-10 use 60/40, ranks 11+ use 40/60 

 

 Documents are chunked into ~900-token pieces with 15% overlap, using smart boundary detection that prefers markdown headings. Embeddings and LLM responses are cached in SQLite ( ~/.cache/qmd/index.sqlite ) with content-hash keying, so moving/renaming files doesn't require re-embedding. 

 

 Local Models 

 QMD runs three GGUF models locally via node-llama-cpp . All models auto-download to ~/.cache/qmd/models/ on first use: 

 

 

 

 Purpose 

 Model 

 Size 

 Source 

 

 

 

 

 Embedding 

 embeddinggemma-300M (Google) 

 ~300MB 

 HuggingFace 

 

 

 Re-ranking 

 Qwen3-Reranker-0.6B-Q8_0 

 ~600MB 

 HuggingFace 

 

 

 Query Expansion 

 qmd-query-expansion-1.7B (Tobi's fine-tune) 

 ~1.7GB 

 HuggingFace 

 

 

 

 On Apple Silicon, QMD auto-detects Metal GPU acceleration at startup. Total VRAM/memory footprint is ~2.5GB — negligible on a 128GB Mac Studio. 

 Swapping the Embedding Model 

 The embedding model can be overridden via environment variable: 

 # Use Qwen3-Embedding for better multilingual (CJK) support

export QMD_EMBED_MODEL="hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf"

# After changing model, re-embed all collections:

qmd embed -f 

 Note: This still loads a local GGUF file — it doesn't point at a remote API. Vectors are not cross-compatible between models, so you must re-index after switching. 

 

 Can I Point Models to a Remote Machine? 

 Short answer: No — QMD is designed to be fully local. 

 QMD runs all inference through node-llama-cpp in-process. There is no built-in configuration to route embedding, reranking, or query expansion to a remote API endpoint. The project's tagline is literally "Tracking current sota approaches while being all local." 

 There is a QMD_OPENAI_BASE_URL environment variable referenced in some third-party integration guides (specifically the ehc-io/qmd fork), but this is not part of Tobi's original QMD and applies to a different use case. 

 Workarounds if you need remote inference: 

 

 The models are tiny (~2.5GB total). On a 128GB Mac Studio they're negligible — just run them locally alongside your main LLM 

 If you absolutely need to offload: QMD exposes an MCP server ( qmd mcp ) with HTTP transport mode. You could run QMD on a remote machine and connect to it as an MCP service. But the models themselves still run local to wherever QMD is installed 

 Fork and modify — QMD is MIT licensed and the embedding/reranking code is in src/ . You could swap the node-llama-cpp calls for HTTP calls to a remote endpoint, but this is custom development 

 

 

 OpenClaw Integration 

 Three integration paths, from simplest to most flexible: 

 1. Built-in Memory Backend (Recommended) 

 Set memory.backend = "qmd" in ~/.openclaw/openclaw.json : 

 {

 "memory": {

 "backend": "qmd",

 "citations": "auto",

 "qmd": {

 "includeDefaultMemory": true,

 "command": "qmd",

 "searchMode": "search",

 "update": {

 "interval": "5m",

 "debounceMs": 15000,

 "onBoot": true,

 "waitForBootSync": false

 },

 "limits": {

 "maxResults": 6,

 "timeoutMs": 4000

 },

 "scope": {

 "default": "deny",

 "rules": [

 { "action": "allow", "match": { "chatType": "direct" } }

 ]

 }

 }

 }

} 

 OpenClaw automatically creates collections (e.g. memory-root for memory/**/*.md ) and indexes on boot. 

 2. MCP Server 

 Run qmd mcp to expose QMD as an MCP tool. Supports stdio and HTTP transport. HTTP daemon mode keeps models warm in VRAM between queries, reducing latency from ~16s to ~10s. 

 3. openclaw-engram Plugin 

 A community plugin ( github.com/joshuaswarren/openclaw-engram ) that uses QMD as the backend for a persistent long-term memory system with LLM-powered extraction. 

 

 Search Modes 

 

 

 

 Command 

 Method 

 Speed 

 Best For 

 

 

 

 

 qmd search 

 BM25 keyword only 

 ~50-200ms 

 You know the exact terms 

 

 

 qmd vsearch 

 Vector similarity only 

 ~500-1000ms 

 Semantic matches without reranking 

 

 

 qmd query 

 Full hybrid pipeline 

 ~3-10s 

 Highest quality results (recommended) 

 

 

 

 

 Key Features 

 

 Query documents — Structured multi-line queries with typed lines ( lex: , vec: , hyde: ) combining keyword precision with semantic recall 

 Intent parameter — Optional --intent flag disambiguates queries across the entire pipeline. "performance" means different things in different contexts 

 Quoted phrases and negation — "C++ performance" -sports -athlete works in lex search 

 Content-hash keying — Moving/renaming files doesn't require re-embedding 

 LLM response caching — Query expansion and rerank scores cached in SQLite 

 Agentic output formats — --json , --files , --md , --full for structured agent consumption 

 Collection contexts — Hierarchical semantic metadata (e.g. "Personal notes and meeting logs") improves relevance 

 Separate indexes — qmd --index work search "quarterly reports" keeps knowledge bases isolated 

 

 

 Storage 

 

 Index: ~/.cache/qmd/index.sqlite (SQLite with FTS5 + sqlite-vec) 

 Models: ~/.cache/qmd/models/ (~2.5GB) 

 Config: ~/.config/qmd/index.yml (collection definitions, respects XDG_CONFIG_HOME) 

 

 

 Quick Reference 

 # Install

npm install -g @tobilu/qmd

# Create a collection

qmd collection add ~/.openclaw/workspace --name workspace --mask "**/*.md"

# Generate embeddings

qmd embed

# Search

qmd query "what did I decide about the camera setup"

# Status

qmd status

# Re-index (with git pull for remote repos)

qmd update --pull 

 

 Version: v1.1.6 (current as of March 2026) License: MIT GitHub: github.com/tobi/qmd npm: @tobilu/qmd Runtime: Node.js 22+ or Bun Dependencies: node-llama-cpp, sqlite-vec, better-sqlite3

Summary
OpenClaw Memory Enhancement Stack 

 Four tools that address different layers of OpenClaw's memory problem. Each solves a different failure mode — they're complementary, not competing. 

 

 

 

 Tool 

 What It Does 

 Runs On 

 Status 

 

 

 

 

 QMD 

 Hybrid search (BM25 + vector + LLM rerank) over markdown files 

 Linux agent box (CPU) or Mac (Metal) 

 Stable — v1.1.6 

 

 

 memory-lancedb-pro 

 Vector memory plugin with decay, hybrid retrieval, scope isolation 

 Linux agent box, embeddings on Mac 

 Stable — npm package 

 

 

 OpenViking 

 Context database with virtual filesystem and tiered token loading 

 Linux agent box, VLM calls on Mac 

 ⚠️ Hold — LiteLLM supply chain attack 

 

 

 MetaClaw 

 Continual learning proxy — agent gets smarter over time 

 Linux agent box, forwards inference to Mac 

 Beta — v0.4.0, very new 

 

 

 

 

 QMD — Search That Actually Works 

 Created by: Tobi Lütke (Shopify CEO) GitHub: github.com/tobi/qmd License: MIT Install: npm install -g @tobilu/qmd 

 What it solves: OpenClaw's built-in memory_search uses pure vector embeddings that routinely return wrong results. QMD replaces it with a three-stage hybrid pipeline. 

 How it works: 

 

 A local 1.7B LLM expands your query into two alternative formulations 

 All three variants run through BM25 keyword search AND vector cosine similarity in parallel 

 Results merge via Reciprocal Rank Fusion (original query weighted 2×) 

 Top 30 candidates scored by a local reranker model 

 Final ranking blends RRF and reranker scores with position-aware weighting 

 

 Local models (~2.5GB total, auto-downloaded): 

 

 embeddinggemma-300M (embedding) 

 Qwen3-Reranker-0.6B (reranking) 

 qmd-query-expansion-1.7B (query expansion — Tobi's custom fine-tune) 

 

 Remote inference: Not supported natively. Models run via node-llama-cpp in-process. On 128GB Mac Studio they're negligible. On the Linux agent box they run fine on CPU (~10-15s per query vs ~3-5s on Metal). 

 OpenClaw integration: Set memory.backend = "qmd" in openclaw.json. OpenClaw creates collections and indexes automatically on boot. 

 Key commands: 

 qmd query "what did I decide about the camera setup" # hybrid search (best quality)

qmd search "frigate RTSP" # keyword only (fastest)

qmd vsearch "home automation preferences" # vector only

qmd status # index info

qmd update --pull # re-index (git pull first)

qmd embed -f # force re-embed all 

 

 memory-lancedb-pro — Long-Term Memory With Decay 

 Created by: CortexReach (community) GitHub: github.com/CortexReach/memory-lancedb-pro License: MIT Install: npm i memory-lancedb-pro 

 What it solves: OpenClaw's default memory has no decay (everything stays equally weighted forever), no hybrid search (vector only), no deduplication, and MEMORY.md dumps its entire contents into every session wasting tokens. 

 How it works: 

 

 Auto-capture: Extracts preferences, facts, decisions, and entities from conversations automatically (up to 3 per turn) 

 Auto-recall: Injects relevant memories into context before agent responds (up to 3 entries) 

 Smart extraction: LLM-powered 6-category classification (Profile, Preferences, Entities, Events, Cases, Patterns) with L0/L1/L2 metadata 

 Hybrid retrieval: Vector search + BM25 keyword search fused with RRF, then cross-encoder reranking 

 Weibull time decay: Memories that aren't accessed gradually fade from active retrieval 

 Three-tier system: Core → Working → Peripheral with automatic promotion/demotion 

 Multi-scope isolation: global, agent:<id>, project:<id>, user:<id>, custom:<name> 

 

 Remote inference: Yes — embedding uses any OpenAI-compatible /v1/embeddings endpoint. Point baseUrl at Ollama/LM Studio on the Mac. Reranking uses Jina, SiliconFlow, or any compatible reranker API. 

 Key config (openclaw.json): 

 {

 "plugins": {

 "slots": { "memory": "memory-lancedb-pro" },

 "entries": {

 "memory-lancedb-pro": {

 "enabled": true,

 "config": {

 "embedding": {

 "model": "nomic-embed-text",

 "baseUrl": "http://<mac-headscale-ip>:11434/v1",

 "apiKey": "not-needed"

 },

 "autoCapture": true,

 "autoRecall": true,

 "smartExtraction": { "enabled": true },

 "retrieval": { "mode": "hybrid", "vectorWeight": 0.7, "bm25Weight": 0.3 }

 }

 }

 }

 }

} 

 Key commands: 

 openclaw memory-pro list --scope global --limit 20 # list memories

openclaw memory-pro search "query" --scope global # search memories

openclaw memory-pro stats --scope global # count, categories, age

openclaw memory-pro export --output memories.json # backup

openclaw memory-pro import memories.json --dry-run # import (test first)

openclaw memory-pro reembed # after changing embedding model

openclaw memory-pro delete <id> # delete specific memory

openclaw memory-pro delete-bulk --before "2026-01-01" # bulk delete

openclaw memory-pro migrate check --source /path/to/old # migrate from built-in plugin 

 After config changes: openclaw config validate then openclaw gateway restart . After editing plugin .ts files: rm -rf /tmp/jiti/ then restart (jiti caches stale compiled code). 

 

 OpenViking — Context Database With Virtual Filesystem 

 Created by: ByteDance / Volcengine Viking Team GitHub: github.com/volcengine/OpenViking License: Apache 2.0 Stars: ~17,900 

 ⚠️ DO NOT INSTALL RIGHT NOW. OpenViking has a dependency on litellm>=1.0.0 . LiteLLM was hit by a supply chain attack on March 24, 2026 (TeamPCP backdoored versions 1.82.7-1.82.8). The entire litellm package is currently quarantined on PyPI — fresh installs of anything depending on it will fail. Wait for the quarantine to lift, then install with "provider": "openai" pointing at your local Ollama endpoint to bypass LiteLLM entirely. 

 What it solves: Replaces flat vector storage with a hierarchical virtual filesystem. Instead of dumping everything into embeddings and hoping retrieval works, OpenViking organises context into structured directories accessible via viking:// URIs. 

 How it works: 

 

 Virtual filesystem: Three root directories — viking://resources/ (documents, repos), viking://user/ (preferences, habits), viking://agent/ (skills, task memories) 

 Unix-like navigation: ls , find , read , tree , grep against agent context 

 L0/L1/L2 tiered loading: L0 = ~100 tokens (abstract), L1 = ~2,000 tokens (overview), L2 = full content on demand. Claims 91-95% token cost reduction vs traditional RAG 

 Automatic memory self-iteration: At session end, extracts 6 memory categories and updates the appropriate directories 

 

 Remote inference: Yes — configure "provider": "openai" with "api_base" pointing at the Mac's Ollama. This bypasses the LiteLLM dependency entirely for your use case. 

 Benchmark results with OpenClaw: 

 

 Task completion: 52.08% (vs 35.65% native OpenClaw — 43% improvement) 

 Input tokens: 4.3M (vs 24.6M native — 91% reduction) 

 

 When to install: After LiteLLM quarantine lifts. Clone repo, strip litellm>=1.0.0 from pyproject.toml if needed, use openai provider only. 

 

 MetaClaw — The Agent That Learns From Its Mistakes 

 Created by: AIMING Lab, UNC Chapel Hill GitHub: github.com/aiming-lab/MetaClaw License: Apache 2.0 Paper: arXiv 2603.17187 (ranked #1 on HuggingFace Daily Papers, March 18 2026) Stars: ~2,700 

 What it solves: The other three tools store and retrieve memories. MetaClaw is the only tool that actually makes the agent smarter over time . It learns from failure patterns and generates corrective behavioural skills. 

 How it works: MetaClaw sits as an OpenAI-compatible proxy between OpenClaw and your LLM. It intercepts every interaction. 

 

 Skill-driven fast adaptation (gradient-free): Analyses failure trajectories via an LLM evolver and synthesises new behavioural skills that take effect immediately. No GPU needed, no downtime. If your agent repeatedly makes the same mistake, MetaClaw generates a corrective skill. 

 Opportunistic policy optimisation (gradient-based, optional): Cloud LoRA fine-tuning via RL, triggered only during user-inactive windows. An Opportunistic Meta-Learning Scheduler monitors sleep hours, keyboard inactivity, and calendar to find training windows. Can be skipped entirely. 

 

 Results: Skill-driven adaptation improved accuracy by up to 32% relative. Full pipeline advanced Kimi-K2.5 from 21.4% to 40.6% accuracy (approaching GPT-5.2's 41.1%). 

 Remote inference: Yes — MetaClaw IS a proxy. Run it on the Linux box, point its upstream at the Mac's LM Studio/Ollama endpoint. All inference happens on the Mac. MetaClaw just intercepts and learns. 

 For your setup: Use skills-only mode (gradient-free). No GPU needed on the Linux box. Since v0.3.3, MetaClaw has a native OpenClaw plugin. Since v0.4.0 (March 25), it includes a "Contexture layer" for cross-session memory. 

 Caveat: Very new (16 days old, 7 releases). Academically rigorous but least battle-tested of the four tools. Worth running but expect rough edges. 

 

 Install Order 

 

 QMD — npm install -g @tobilu/qmd , set memory.backend = "qmd" . Immediate improvement to search/recall. 

 memory-lancedb-pro — npm i memory-lancedb-pro , configure embedding endpoint to Mac. Fixes memory bloat and adds decay. 

 MetaClaw — Install as OpenClaw plugin, configure as proxy. Adds learning over time. 

 OpenViking — Wait for LiteLLM situation to resolve. Then install with openai provider pointing at Mac. 

 

 Test each layer before adding the next. If something breaks, you know which component caused it. 

 

 Architecture 

 ┌─────────────────────────┐ ┌─────────────────────────┐

│ Linux Machine │ │ Mac Studio 128GB │

│ │ │ │

│ OpenClaw Gateway │──HTTP──▶│ LM Studio / Ollama │

│ QMD (search index) │ │ - Primary LLM │

│ memory-lancedb-pro │──HTTP──▶│ - Embedding model │

│ MetaClaw (proxy) │──HTTP──▶│ (nomic-embed-text) │

│ OpenViking (future) │──HTTP──▶│ │

│ Workspace files │ │ Headscale connected │

│ Headscale connected │ │ │

└─────────────────────────┘ └─────────────────────────┘ 

 All heavy inference on the Mac. All files, indexing, gateway, and agent logic on the Linux box.

OpenViking
OpenViking — Context Database for AI Agents 

 What it is: OpenViking is an open-source context database from ByteDance's Volcengine team that replaces flat vector storage with a hierarchical virtual filesystem. Instead of dumping everything into embeddings and hoping retrieval works, it organises all agent context into structured directories accessible via viking:// URIs — like a Unix filesystem for your agent's brain. 

 Why it matters: In benchmarks with OpenClaw, OpenViking improved task completion from 35.65% to 52.08% (43% improvement) while cutting input tokens from 24.6M to 4.3M (91% reduction). The tiered loading system means your agent only pulls in what it needs, when it needs it. 

 GitHub: github.com/volcengine/OpenViking Stars: ~17,900 License: Apache 2.0 Language: Python (core) + Rust (CLI) + C++ (vector extensions) + Go (AGFS server) Requires: Python 3.10+ 

 

 How It Works 

 Virtual Filesystem 

 All agent context lives in three root directories: 

 

 viking://resources/ — documents, repos, web pages, project files 

 viking://user/ — preferences, habits, personal context 

 viking://agent/ — skills, task memories, instructions 

 

 You interact with context using Unix-like commands: ls , find , read , tree , grep . The agent navigates its own memory the same way you'd navigate a filesystem. 

 L0/L1/L2 Tiered Loading 

 This is the key innovation that solves token bloat. Instead of loading full documents into context, OpenViking processes everything into three tiers: 

 

 L0 (Abstract): ~100 tokens — one-sentence summary for quick identification 

 L1 (Overview): ~2,000 tokens — structured overview with key details 

 L2 (Full Content): Complete document, loaded on demand only 

 

 The agent starts with L0 summaries, drills into L1 when something looks relevant, and only loads L2 when it genuinely needs the full content. Claims 91-95% token cost reduction vs traditional RAG approaches. 

 Automatic Memory Self-Iteration 

 At session end, OpenViking automatically extracts six categories of memory and updates the appropriate directories: 

 

 Profile — factual information about the user 

 Preferences — how the user likes things done 

 Entities — people, projects, tools referenced 

 Events — decisions made, things that happened 

 Cases — problems solved, approaches taken 

 Patterns — recurring behaviours and workflows 

 

 This is significantly more structured than OpenClaw's default "hope the LLM remembers to write to MEMORY.md" approach. 

 

 Architecture For Our Setup 

 OpenViking runs on the Linux agent box. All LLM inference (VLM and embedding) routes to the Mac Studio over Headscale. 

 ┌─────────────────────────┐ ┌─────────────────────────┐

│ Linux Machine │ │ Mac Studio 128GB │

│ │ │ │

│ OpenViking Server │──HTTP──▶│ Ollama │

│ (port 1933) │ │ - VLM model │

│ │ │ - Embedding model │

│ OpenClaw Gateway │ │ (nomic-embed-text) │

│ (port 18789) │ │ │

│ + openclaw-openviking │ │ │

│ plugin │ │ │

└─────────────────────────┘ └─────────────────────────┘ 

 

 VLM Providers 

 OpenViking supports three VLM providers. For our local setup, use openai to bypass the LiteLLM dependency entirely: 

 

 

 

 Provider 

 Description 

 Use For Our Setup? 

 

 

 

 

 volcengine 

 ByteDance's Doubao models (cloud) 

 No — cloud only 

 

 

 openai 

 Any OpenAI-compatible API endpoint 

 Yes — point at Ollama on Mac 

 

 

 litellm 

 Multi-provider routing via LiteLLM 

 No — compromised, avoid 

 

 

 

 

 Installation 

 Prerequisites 

 # Install uv (Python package manager used by OpenViking)

curl -LsSf https://astral.sh/uv/install.sh | sh

# Install the Rust CLI

curl -fsSL https://raw.githubusercontent.com/volcengine/OpenViking/main/crates/ov_cli/install.sh | bash 

 Install OpenViking (with LiteLLM workaround) 

 # Clone the repo

git clone https://github.com/volcengine/OpenViking.git

cd OpenViking

# Create virtual environment

uv venv --python 3.11

source .venv/bin/activate

# IMPORTANT: Edit pyproject.toml to remove or comment out litellm dependency

# Find the line with litellm>=1.0.0 and remove it

nano pyproject.toml

# Install

uv pip install -e . 

 Install VikingBot (agent framework, optional) 

 uv pip install -e ".[bot]" 

 Install OpenClaw Plugin 

 # The official OpenClaw integration plugin

openclaw plugins install openclaw-openviking-plugin 

 

 Configuration 

 Create the config file at ~/.openviking/ov.conf : 

 {

 "storage": {

 "workspace": "/home/conor/openviking_workspace"

 },

 "log": {

 "level": "INFO",

 "output": "stdout"

 },

 "embedding": {

 "dense": {

 "api_base": "http://100.64.0.9:11434/v1",

 "api_key": "not-needed",

 "provider": "openai",

 "dimension": 768,

 "model": "nomic-embed-text"

 },

 "max_concurrent": 10

 },

 "vlm": {

 "api_base": "http://100.64.0.9:11434/v1",

 "api_key": "not-needed",

 "provider": "openai",

 "model": "qwen3-coder-next",

 "max_concurrent": 100

 }

} 

 Notes: 

 

 100.64.0.9 is the Mac Studio's Headscale IP 

 provider: "openai" works with any OpenAI-compatible endpoint including Ollama 

 Embedding dimension must match the model — 768 for nomic-embed-text, 1024 for mxbai-embed-large 

 The VLM model handles the intelligent context processing (summarisation, extraction, classification) 

 

 

 Running 

 # Start the OpenViking server

openviking-server

# Default address: 127.0.0.1:1933

# Start with VikingBot enabled (optional)

openviking-server --with-bot 

 

 Key Commands (ov CLI) 

 Adding Resources 

 # Add a GitHub repo

ov add-resource https://github.com/volcengine/OpenViking

# Add a local directory

ov add-resource /home/conor/.openclaw/workspace

# Wait for processing to complete (otherwise runs async)

ov add-resource /path/to/docs --wait 

 Browsing Context 

 # List root directories

ov ls viking://resources/

# Tree view (2 levels deep)

ov tree viking://resources/my-project -L 2

# Read a specific file at L1 (overview)

ov read viking://resources/my-project/README.md

# Check server status

ov status 

 Searching 

 # Semantic search across all context

ov find "what is the camera VLAN setup"

# Grep for exact terms within a scope

ov grep "RTSP" --uri viking://resources/home-automation 

 Interactive Chat (with VikingBot) 

 # Start interactive chat session

ov chat 

 

 OpenClaw Integration 

 Once the openclaw-openviking-plugin is installed, OpenViking becomes available as a context source for your agent. The plugin connects to the OpenViking server running on the same machine and provides the agent with access to the virtual filesystem commands. 

 The OpenViking server must be running before the OpenClaw gateway starts. Consider adding it as a systemd service: 

 # Example systemd service file

# /etc/systemd/user/openviking.service

[Unit]

Description=OpenViking Context Database

Before=openclaw.service

[Service]

ExecStart=/home/conor/OpenViking/.venv/bin/openviking-server

WorkingDirectory=/home/conor/OpenViking

Restart=always

RestartSec=5

[Install]

WantedBy=default.target 

 # Enable and start

systemctl --user enable openviking

systemctl --user start openviking 

 

 How OpenViking Differs From The Other Memory Tools 

 

 

 

 Feature 

 OpenClaw Default 

 memory-lancedb-pro 

 QMD 

 OpenViking 

 

 

 

 

 Storage model 

 Flat markdown files 

 Vector DB + BM25 

 SQLite + vector index 

 Virtual filesystem 

 

 

 Token efficiency 

 Loads everything 

 Top-N retrieval 

 Top-N retrieval 

 L0/L1/L2 tiered loading 

 

 

 Context organisation 

 None 

 Scopes (global/agent/project) 

 Collections 

 Hierarchical directories 

 

 

 Auto-categorisation 

 No 

 6 categories 

 No 

 6 categories + directory structure 

 

 

 Agent can browse 

 Read specific files 

 Search only 

 Search only 

 ls, tree, find, grep, read 

 

 

 Multi-resource support 

 Workspace only 

 Conversation memories 

 Markdown files 

 Git repos, URLs, docs, local dirs 

 

 

 

 OpenViking and memory-lancedb-pro are complementary, not competing. memory-lancedb-pro handles conversation-level memory (what you said, preferences extracted from chat). OpenViking handles resource-level context (your codebase, documentation, project files, knowledge bases). QMD provides the search layer across markdown files. Together they cover different retrieval needs. 

 

 Breaking Changes Warning 

 From the release notes: after upgrading OpenViking, datasets/indexes generated by previous versions are not compatible. You must rebuild: 

 # Stop the server

systemctl --user stop openviking

# Clear workspace (this deletes all indexed data — resources will need re-adding)

rm -rf /home/conor/openviking_workspace

# Restart

systemctl --user start openviking

# Re-add your resources

ov add-resource /path/to/your/stuff --wait 

 

 Key Files and Paths 

 

 Config: ~/.openviking/ov.conf (override with OPENVIKING_CONFIG_FILE env var) 

 Workspace: Configured in ov.conf storage.workspace — stores all indexed data 

 CLI binary: ov (Rust, installed via install script) 

 Server: openviking-server (Python, from the pip install) 

 Default port: 1933 

 

 

 Version: v0.2.1 (current as of March 2026) Status: Alpha — performance and consistency not fully optimised GitHub: github.com/volcengine/OpenViking Docs: openviking.ai

LanceDB Pro
memory-lancedb-pro 

 Enhanced LanceDB memory plugin for OpenClaw — community reference guide 

 Overview 

 memory-lancedb-pro is a community-developed, production-grade long-term memory plugin for OpenClaw. It replaces the built-in memory-lancedb plugin with a significantly more capable retrieval pipeline, designed for agents that need persistent, high-quality memory across sessions without manual tagging or configuration overhead. 

 The core problem it solves: standard OpenClaw agents have no memory between sessions. Every conversation starts from zero. memory-lancedb-pro automatically captures what matters from each session and retrieves relevant context in future ones. 

 Primary upstream repo: CortexReach/memory-lancedb-pro . Several community forks exist (win4r, McBorisson, fryeggs, kvc0769) with varying additions such as Volcengine multimodal embeddings or unified Claude Code/Claude Desktop support. 

 OpenClaw 2026.3+ compatibility: The CortexReach fork has been updated to use before_prompt_build hooks, replacing the deprecated before_agent_start hook. If you are on 2026.3.24 or later, use this fork. Run openclaw doctor --fix after upgrading. 

 

 Feature Comparison 

 

 

 

 Feature 

 Built-in memory-lancedb 

 memory-lancedb-pro 

 

 

 

 

 Vector search 

 ✓ 

 ✓ 

 

 

 BM25 full-text search 

 ✗ 

 ✓ 

 

 

 Hybrid fusion (Vector + BM25) 

 ✗ 

 ✓ configurable weights 

 

 

 Cross-encoder reranking 

 ✗ 

 ✓ Jina, SiliconFlow, Pinecone, etc. 

 

 

 Recency / time decay scoring 

 ✗ 

 ✓ 

 

 

 MMR diversity filtering 

 ✗ 

 ✓ 

 

 

 Multi-scope isolation 

 ✗ 

 ✓ global / agent / project / user 

 

 

 Smart LLM extraction 

 ✗ 

 ✓ optional, uses any OpenAI-compatible LLM 

 

 

 Management CLI 

 ✗ 

 ✓ list / search / stats / delete / export / import 

 

 

 Auto-capture on session end 

 ✓ basic 

 ✓ with deduplication, up to 3 per turn 

 

 

 Auto-recall before prompt 

 ✓ basic 

 ✓ adaptive — skips trivial/short queries 

 

 

 Noise filtering 

 ✗ 

 ✓ 

 

 

 Migration tool from built-in plugin 

 — 

 ✓ 

 

 

 

 

 Retrieval Pipeline 

 Queries pass through a multi-stage pipeline before results are injected into the agent prompt: 

 

 Embed query — using the configured OpenAI-compatible embedding provider 

 Parallel search — vector ANN search (cosine distance) + BM25 full-text search run simultaneously 

 Hybrid fusion — vector score used as base; BM25 hits receive a configurable weighted boost 

 Rerank — optional cross-encoder reranking via external API (60% cross-encoder score + 40% fused score) 

 Lifecycle decay scoring — recency boost, time decay, importance weight, length normalisation 

 Filter — hard minimum score, noise filter, MMR diversity deduplication 

 Inject — surviving memories injected as <relevant-memories> context block 

 

 If the reranker API fails, the pipeline degrades gracefully to cosine similarity reranking. 

 

 Installation 

 1. Clone into your OpenClaw workspace 

 cd ~/.openclaw/workspace

git clone https://github.com/CortexReach/memory-lancedb-pro.git plugins/memory-lancedb-pro

cd plugins/memory-lancedb-pro

npm install 

 Common mistake: Cloning the repo somewhere other than your workspace and then using a relative path in plugins.load.paths . Relative paths are resolved from the workspace root. Use an absolute path if cloning elsewhere. 

 2. Disable the built-in memory plugin 

 Only one memory plugin can be active at a time. If you previously used memory-lancedb , disable it before enabling this plugin. 

 3. Add to openclaw.json 

 {

 "plugins": {

 "load": {

 "paths": ["plugins/memory-lancedb-pro"]

 },

 "entries": {

 "memory-lancedb-pro": {

 "enabled": true,

 "config": {

 "embedding": {

 "apiKey": "${JINA_API_KEY}",

 "model": "jina-embeddings-v5-text-small",

 "baseURL": "https://api.jina.ai/v1",

 "dimensions": 1024,

 "taskQuery": "retrieval.query",

 "taskPassage": "retrieval.passage",

 "normalized": true

 }

 }

 }

 },

 "slots": {

 "memory": "memory-lancedb-pro"

 }

 }

} 

 Config changes require a gateway restart. With config watch enabled (default), this happens automatically. 

 

 Key Configuration Options 

 

 

 

 Option 

 Default 

 Notes 

 

 

 

 

 autoCapture 

 true 

 Capture memories at session end 

 

 

 autoRecall 

 true 

 Inject memories before prompt build 

 

 

 smartExtraction 

 true 

 Use LLM to classify memories instead of regex 

 

 

 extractMinMessages 

 3 

 Minimum messages before extraction runs 

 

 

 captureAssistant 

 true 

 Set false to only capture user messages 

 

 

 retrieval.mode 

 hybrid 

 vector , bm25 , or hybrid 

 

 

 retrieval.vectorWeight 

 0.7 

 Weight for vector scores in hybrid fusion 

 

 

 retrieval.bm25Weight 

 0.3 

 Weight for BM25 scores in hybrid fusion 

 

 

 rerank.enabled 

 false 

 Enable cross-encoder reranking 

 

 

 rerank.candidatePoolSize 

 12 

 Candidates passed to reranker 

 

 

 rerank.minScore 

 0.6 

 Soft minimum score post-rerank 

 

 

 rerank.hardMinScore 

 0.62 

 Hard cutoff — below this is always dropped 

 

 

 sessionMemory.enabled 

 true 

 Store session summaries on /new 

 

 

 autoRecall.minPromptLength 

 15 (EN) / 6 (CJK) 

 Skip recall for very short queries 

 

 

 

 

 Management CLI 

 The plugin ships with a CLI for direct memory management: 

 openclaw memory-pro list # list stored memories

openclaw memory-pro search <query> # semantic/keyword search

openclaw memory-pro stats # storage stats

openclaw memory-pro delete <id> # delete a specific memory

openclaw memory-pro export # export all memories

openclaw memory-pro import <file> # import memories 

 

 Agent Tool Definitions 

 When loaded, the plugin registers these tools for the agent to use directly: 

 

 memory_recall — retrieve relevant memories for a query 

 memory_store — explicitly store a memory 

 memory_forget — delete a memory by ID or query 

 memory_update — update an existing memory 

 

 Plus additional management tools exposed via the CLI commands above. 

 

 Multi-Scope Isolation 

 Memories can be scoped to control access between agents and users: 

 

 global — shared across all agents 

 agent:<id> — isolated to a specific agent 

 project:<id> — shared within a project 

 user:<id> — per-user isolation (useful for multi-user bots) 

 custom:<name> — arbitrary named scope 

 

 

 Telegram Setup 

 If running OpenClaw with Telegram, the easiest way to configure the plugin is via the bot directly. Send the following to your main bot: 

 Help me connect this memory plugin with the most user-friendly configuration:

https://github.com/CortexReach/memory-lancedb-pro

Requirements:

1. Set it as the only active memory plugin

2. Use Jina for embedding and reranker

3. Use gpt-4o-mini for the smart-extraction LLM

... (continue with your preferences) 

 

 Important Notes 

 jiti cache: After modifying any .ts file in the plugin, you must clear the jiti cache before restarting the gateway, or OpenClaw will load stale compiled code: rm -rf /tmp/jiti/ && openclaw gateway restart 

 Memory quality guidelines: Never store raw conversation summaries, large blobs, or duplicates. Prefer structured, atomic facts with keywords. On any tool failure or repeated error, call memory_recall with relevant keywords before retrying — the fix may already be stored. 

 Spaced repetition: Frequently recalled memories decay more slowly, similar to spaced-repetition learning systems. 

 

 Notable Community Forks 

 

 

 

 Fork 

 Notable additions 

 

 

 

 

 CortexReach/memory-lancedb-pro 

 Primary upstream. Updated for OpenClaw 2026.3+ hook architecture. 

 

 

 win4r/memory-lancedb-pro 

 Widely referenced in docs; standard feature set. 

 

 

 fryeggs/memory-lancedb-pro 

 Unified edition — extends to Claude Code, Codex CLI, and Claude Desktop via shared LanceDB backend. 

 

 

 kvc0769/memory-lancedb-pro 

 Adds Volcengine multimodal embedding support. 

 

 

 McBorisson/memory-lancedb-pro 

 Uses RRF fusion (vs. weighted boost in other forks); includes JSONL distillation pipeline. 

 

 

 

 

 Generated March 2026. Sources: CortexReach/memory-lancedb-pro, openclaw/openclaw docs, LanceDB blog.

design languange
● Design language summary 

   Vibe. Dark, minimal, slightly luminous. Information-dense but airy. Color is reserved — accent appears on actions and active states; semantic colors only for status. Most surfaces are low-contrast neutral; the eye is drawn by the one   accent. 

   Palette. Near-black background #0b0d12, two elevation tiers above it (#14171d panel, #1a1e26 panel-2). Borders are translucent white (7% default, 12% strong). Text is warm-white #e6eef6, muted #8a93a4, super-muted #5e6675. Accent is a   blue-violet #4f8cff with #6ea0ff for hover and a 15% soft variant for active backgrounds. A violet #8b5cf6 shows up only inside accent gradients (logos). Status: green #34d399, amber #fbbf24, red #f87171. 

   Type. Inter, system fallback. Base 14.5px / 1.5. Headings get tight tracking (-0.01em to -0.02em). Small labels are uppercase, ~0.78rem, 0.06–0.08em letter-spacing, in the muted tone. Numbers always use tabular figures. Antialiased. 

   Spacing & shape. Spacing scale of 4/8/12/16/24/32. Radii: 12px panels, 8px controls/inputs, 6px chips, 999px for bars. Two shadow tiers — subtle 0 2px 8px /35% for cards, lifted 0 8px 24px /45% for floating things. Active accents may   add a tinted glow (rgba(79,140,255,0.35)). 

   Layout. Sticky 240px sidebar + fluid main. On mobile the sidebar collapses to a horizontal top nav. Each page opens with a page-header (title + small meta on the right, divider underneath). Content lives inside card-panel blocks — never    floating directly on the bg. 

   Components.   - Stat card: tiny uppercase label with icon → big tabular number → muted sub-line. Border lightens on hover.   - Sidebar nav-item: icon + label, transparent base, accent-soft background when active. Destructive items shift to red on hover.   - Segmented control: pill group inside a panel-2 shell; active pill is solid accent with glow.   - Table: transparent rows, uppercase muted column headers, tabular nums right-aligned, 2% white row-hover.   - Bar (disk): 6px gradient fill, threshold-based color shift at 75% / 90%.   - Chip/folder pill: panel-2 with border, small inline icon, 6px radius. 

   Iconography. Bootstrap Icons (bi-*), inline with labels at slightly reduced opacity. 

   Motion. 0.15s ease for hover/active transitions; nothing splashy. Async sections fade to 50% opacity (is-loading); a single accent spinner for spinners. 

   Charts. Plotly with transparent bg, Inter font, 5% white gridlines, legend below the plot, custom dark hover label. Series colors come from a fixed 10-stop palette (blue → violet → green → amber → coral …) — never raw Plotly defaults. 

   Don'ts. No hard borders. No heavy drop shadows. No mixing accent colors for non-action UI. No raw Bootstrap-default buttons or tables. No solid color blocks for headers — keep them transparent over the panel.