The Knowledge Base
Your AI Agent Needs
Built-in MCP server, vector search, RAG, and hybrid retrieval. Plugs into Claude, ChatGPT, Cursor, Windsurf, and any MCP client — single ~29MB binary, zero config
Why MDDB? #
One binary. Every protocol. AI-native from day one.
MDDB is an AI-native embedded document database written in Go, using BoltDB for storage. It accepts Markdown, plain text, HTML, PDF, DOCX, ODT, RTF, LaTeX, YAML, and Wikipedia XML dump documents, auto-converting everything to Markdown. All documents are stored with full revision history.
Core capabilities
- Built-in MCP server with 67 tools — connects to Claude, Cursor, Windsurf, ChatGPT, Ollama, DeepSeek, Manus via stdio and HTTP transport
- Geospatial search: R-tree and geohash indexes, radius and bounding-box queries, composable with full-text and vector search, optional postcode lookup
- Transport options: TCP host:port or Unix Domain Socket (
unix:/path.sock) for zero-network local deployments (PHP-FPM sidecars, cron, same-host panel) - TLS and mutual TLS (mTLS): user-supplied server cert/key, optional client CA bundle for certificate-based client authentication (MDDB_TLS_CLIENT_CA)
- Vector/semantic search with 7 index algorithms: Flat (exact), HNSW, IVF, PQ, OPQ, SQ, BQ — supports OpenAI, Ollama, Cohere, Voyage AI embeddings + per-collection int8/int4 quantization (4-8x compression)
- Full-text search with 7 modes (simple, boolean, phrase, wildcard, proximity, range, fuzzy), TF-IDF, BM25, BM25F, PMISparse scoring, multi-language stemming (18 languages), typo tolerance, metadata pre-filtering
- Hybrid search combining sparse (BM25) and dense (vector) with alpha blending or Reciprocal Rank Fusion
- Zero-shot document classification using embedding cosine similarity
- Custom MCP tools defined in YAML for domain-specific AI workflows
- RAG pipeline: auto-embed on ingest, semantic retrieval, MCP exposure to LLMs
- Memory RAG: conversational memory with session management, semantic recall, and summarization
- Triple protocol: HTTP/JSON REST API, gRPC/Protobuf, GraphQL
- Multi-format upload: .md, .txt, .html, .pdf, .docx, .odt, .rtf, .tex, .yaml with auto-conversion to Markdown
- Wikipedia XML dump import: stream multi-GB MediaWiki dumps with wikitext-to-Markdown conversion, namespace filtering, batch processing
- URL import: fetch and store documents from any URL
- Document TTL with auto-expiry and background cleanup
- Full revision history for every document
- JWT authentication and RBAC authorization (opt-in)
- Leader-follower replication with binlog streaming
- Automation: triggers, crons, webhooks, sentiment analysis, template variables
- Multi-language support: same document key, multiple locales
- Prometheus metrics and Grafana dashboard support
- ~29MB single binary, embedded BoltDB, zero external dependencies
- Docker image: ~29MB Alpine-based with health checks
- Aggregations: metadata facets (value counts) and date histograms with optional pre-filtering
- Per-collection storage backends: BoltDB (default), in-memory (ephemeral), S3/MinIO — configurable via API and web panel
- Web admin panel (React-based) with REST/GraphQL API toggle
- PHP and Python client libraries
67 Built-in MCP Tools
MCP 2025-11-25 compliant. Tool annotations for auto-approve, 5 built-in prompts, completion, structured output, logging. Three transports: stdio, Streamable HTTP, SSE.
Hybrid Search Engine
Full-text with 7 search modes (simple, boolean, phrase, wildcard, proximity, range, fuzzy) + vector search with 7 index types. Multi-language stemming for 18 languages. Combine with alpha blending or Reciprocal Rank Fusion. Metadata pre-filtering, typo tolerance.
Triple Protocol Access
REST for easy debugging, gRPC for high-throughput pipelines, GraphQL for flexible frontend queries. All endpoints, one server.
Multi-Format Pipeline
Upload .md, .txt, .html, .pdf, .docx, .odt, .rtf, .tex, .yaml, import from any URL, or stream Wikipedia XML dumps (.xml.bz2). Everything auto-converts to Markdown. Background embedding worker indexes documents as they arrive.
🎨 Web Admin Panel
Modern React-based UI for managing documents, users, and search with REST/GraphQL API toggle
Quick Start #
# Run MDDB server
docker run -d \ -p 11023:11023 \ -p 11024:11024 \ -v mddb-data:/data \ tradik/mddb:latest # Test the API
curl http://localhost:11023/health# Clone repository
git clone https://github.com/tradik/mddb.git
cd mddb # Start all services (MDDB + Panel + MCP)
make docker-up # Services available:
# - MDDB HTTP: http://localhost:11023
# - MDDB gRPC: localhost:11024
# - Web Panel: http://localhost:3000
# - MCP Server: http://localhost:9000# Download binary
wget https://github.com/tradik/mddb/releases/download/v2.9.11/mddbd-v2.9.11-linux-amd64.tar.gz
tar -xzf mddbd-v2.9.11-linux-amd64.tar.gz # Run server
./mddbd # Server starts on http://localhost:11023# Clone repository
git clone https://github.com/tradik/mddb.git
cd mddb # Build
make build # Run
./bin/mddbdDownload #
Latest Release
Released: April 11, 2026
Package Managers
What's New in v2.9.11
- Unix Domain Socket transport — HTTP and gRPC servers now accept
MDDB_HTTP_ADDR=unix:/tmp/mddb.sock/MDDB_GRPC_ADDR=unix:/tmp/mddb-grpc.sock. Zero-network local transport with0600filesystem perms. Python and PHP clients gain matchingunix:scheme support — stdlib-only on Python, libcurlCURLOPT_UNIX_SOCKET_PATHon PHP. - Mutual TLS (mTLS) — new
MDDB_TLS_CLIENT_CA+MDDB_TLS_CLIENT_AUTH=require|requestconfigure client-cert verification on the HTTPS listener.tls.Config.MinVersionpinned to TLS 1.2. - MCP tool count audit — discovered the long-claimed "72 tools" was a documentation-only number that never matched code. Audited
mcpBuiltinTools()vs the dispatch switch and surfaced one orphan tool (aggregate) that had a working handler but no declaration, so MCP clients could not see or call it. Declared it. Authoritative count is now 67 tools. - Landing page Mermaid fix — replication diagram was rendering "Syntax error in text" on mermaid 10.9.5. Switched to
<div class="mermaid">, upgraded to mermaid@11, explicitmermaid.run(). - Geosearch docs 404 fix —
docs/GEOSEARCH.mdwas missing SSG frontmatter, so/docs/geosearch/never built. Frontmatter added. - Landing page split into includes — the monolithic
index.html(~1500 lines) is now composed ofindex-section-*.htmlfiles via Go template{{template ...}}. Easier to edit, easier to review.
Documentation #
Getting Started
- Quick StartGet up and running in 5 minutes
- API DocumentationInteractiveSwagger UI for all endpoints
- OpenAPI SpecMachine-readable API specification
- ExamplesCode examples and patterns
- Docker GuideContainer deployment
- ArchitectureSystem design and internals
Search & AI
- Search AlgorithmsNewTF-IDF, BM25, BM25F, Flat, HNSW, IVF, PQ, OPQ, SQ, BQ
- PMISparseNewBM25 + PPMI query expansion (Tradik Limited)
- GeosearchNewR-tree + geohash radius/bbox queries
- RAG PipelineNewWordPress → MDDB → LLM guide
- LLM ConnectionsNewClaude, ChatGPT, Ollama, DeepSeek, Manus, Bielik.ai
- MCP Server ConfigNewAPI keys, rate limits, logging, access modes
- Custom MCP ToolsNewYAML-defined website-specific AI tools
- IntegrationsNewDocling, Langflow, OpenSearch pipelines
Security & Operations
- Authentication & RBACNewJWT tokens, API keys, role-based access
- Auth Quick Start5-minute authentication setup
- ReplicationNewLeader-follower, Docker Compose, monitoring
- Telemetry & MonitoringNewPrometheus metrics, Grafana dashboards
- Health ChecksDocker & Kubernetes monitoring
- License AuditDependency license compliance (Trivy)
Protocols & Advanced
- gRPC GuideHigh-performance protocol
- Schema ValidationEnforce metadata structure per collection
- AutomationsNewTriggers, crons, webhooks, sentiment
- Temporal TrackingNewEvent history, hot-docs leaderboard, activity histograms
- Spell CorrectionNewSymSpell FTS corrections, custom dictionaries
- WordPress AI AgentNewBuild a chatbot for your WP site
- Chat WidgetNewEmbeddable AI chatbot powered by MDDB
Code Examples #
# Add document with TTL (expires in 1 hour)
curl -X POST http://localhost:11023/v1/add \ -H "Content-Type: application/json" \ -d '{ "collection": "blog", "key": "hello-world", "lang": "en_US", "ttl": 3600, "meta": { "category": ["tutorial"], "author": ["John Doe"] }, "contentMd": "# Hello World\n\nWelcome to MDDB!" }' # Import directly from URL
curl -X POST http://localhost:11023/v1/import-url \ -d '{"collection":"docs", "url":"https://example.com/guide.md", "lang":"en_US"}'# Semantic search - flat (exact, default)
curl -X POST http://localhost:11023/v1/vector-search \ -H "Content-Type: application/json" \ -d '{ "collection": "kb", "query": "how do I cancel my subscription?", "topK": 5, "threshold": 0.7, "includeContent": true }' # HNSW - fast approximate nearest neighbor
curl -X POST http://localhost:11023/v1/vector-search \ -d '{"collection":"kb", "query":"refund", "topK":5, "algorithm":"hnsw"}' # IVF - clustered search for large collections
curl -X POST http://localhost:11023/v1/vector-search \ -d '{"collection":"kb", "query":"refund", "topK":5, "algorithm":"ivf"}' # PQ - compressed, memory-efficient search
curl -X POST http://localhost:11023/v1/vector-search \ -d '{"collection":"kb", "query":"refund", "topK":5, "algorithm":"pq"}'# Simple search with BM25 scoring
curl -X POST http://localhost:11023/v1/fts \ -H "Content-Type: application/json" \ -d '{ "collection": "blog", "query": "markdown database tutorial", "limit": 10, "algorithm": "bm25" }' # Boolean search (AND, OR, NOT, +required, -excluded)
curl -X POST http://localhost:11023/v1/fts \ -H "Content-Type: application/json" \ -d '{ "collection": "blog", "query": "rust AND performance NOT garbage", "mode": "boolean" }' # Phrase search — exact word sequence
curl -X POST http://localhost:11023/v1/fts \ -H "Content-Type: application/json" \ -d '{ "collection": "blog", "query": "\"machine learning\"", "mode": "phrase" }' # Proximity search — terms within 5 words
curl -X POST http://localhost:11023/v1/fts \ -H "Content-Type: application/json" \ -d '{ "collection": "blog", "query": "\"database performance\"~5", "mode": "proximity", "distance": 5 }'curl -X POST http://localhost:11023/v1/search \ -H "Content-Type: application/json" \ -d '{ "collection": "blog", "filterMeta": { "category": ["tutorial"], "status": ["published"] }, "sort": "updatedAt", "limit": 10 }'from mddb import MDDB # TCP connection
db = MDDB.connect('localhost:11023', 'write').collection('kb') # Unix Domain Socket (MDDB 2.9.11+, zero-network local transport)
# db = MDDB.connect('unix:/tmp/mddb.sock', 'write').collection('kb') # Add document
db.add('faq', 'en_US', {'category': ['billing']}, '# Billing FAQ\n\nCancel in Settings.') # Vector search (RAG pipeline step 1)
results = db.vector_search('how to cancel?', top_k=3, include_content=True) # Full-text search
results = db.fts_search('billing refund', limit=10) # Webhooks
db.register_webhook('https://app.com/hook', ['doc.added'], 'kb') # Import from URL
db.import_url('https://example.com/docs.md', 'en_US') # TTL
db.set_ttl('temp-key', 'en', 3600) # expires in 1h<?php
require_once 'mddb.php'; // TCP connection
$db = mddb::connect('localhost:11023', 'write'); // Unix Domain Socket (MDDB 2.9.11+)
// $db = mddb::connect('unix:/tmp/mddb.sock', 'write'); // Add document
$db->collection('blog')->add('hello', 'en_US', ['category' => ['tutorial']], '# Hello'); // Vector search
$results = $db->collection('kb')->vectorSearch('cancel subscription', 5, 0.7, true); // Full-text search
$results = $db->collection('blog')->ftsSearch('database tutorial', 10); // Webhooks
$db->registerWebhook('https://app.com/hook', ['doc.added', 'doc.updated']); // Import from URL
$db->collection('docs')->importUrl('https://example.com/post.md', 'en_US'); // TTL
$db->collection('cache')->setTtl('temp', 'en', 3600);# Docker (Easiest) - for Windsurf / Claude Desktop
# Configure ~/.windsurf/mcp.json:
{ "mcpServers": { "mddb": { "command": "docker", "args": [ "run", "-i", "--rm", "--network", "host", "-e", "MDDB_GRPC_ADDRESS=localhost:11024", "-e", "MDDB_REST_BASE_URL=http://localhost:11023", "-e", "MDDB_MCP_STDIO=true", "tradik/mddb:latest" ] } }
} # MCP provides 67 built-in tools including:
# - add_document, search_documents, semantic_search
# - full_text_search, hybrid_search, cross_search
# - classify_document, find_duplicates
# - geo_search, geo_within, geo_encode, geo_decode, geo_stats
# - automation, webhooks, schemas, backups
# - collection config, revisions, aggregate, and more🔄 Leader-Follower Replication #
Scale reads horizontally with binlog-based streaming replication. A single leader handles writes and streams transactions in real-time to read-only followers via gRPC.
