MDDB API Documentation

Note: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Table of Contents

Overview

MDDB is a lightweight markdown database server built with Go and BoltDB. It provides a RESTful API for storing, retrieving, and managing markdown documents with metadata.

Base URL: http://localhost:11023

API Version: v1

Configuration

The server can be configured using environment variables:

VariableDefaultDescription
MDDB_ADDR:11023Server address and port
MDDB_MODEwrAccess mode: read, write, or wr (read+write). Also: --mode flag, database.mode in YAML
MDDB_PATHmddb.dbPath to the BoltDB database file. Also: --db flag, database.path in YAML
MDDB_EMBEDDING_PROVIDERnoneEmbedding provider: openai, ollama, voyage, or none
MDDB_EMBEDDING_API_KEYAPI key for OpenAI or Voyage AI
MDDB_EMBEDDING_API_URL(per provider)API base URL (see Vector Search)
MDDB_EMBEDDING_MODEL(per provider)Embedding model name
MDDB_EMBEDDING_DIMENSIONS(per provider)Vector dimensions
MDDB_FTS_STEMMINGtrueEnable stemming for FTS
MDDB_FTS_DEFAULT_LANGenDefault language for FTS stemming and stop words (18 languages supported)
MDDB_FTS_SYNONYMStrueEnable synonym expansion for FTS
MDDB_COMPRESSION_ENABLEDtrueEnable adaptive compression (Snappy/Zstd)
MDDB_COMPRESSION_SMALL_THRESHOLD1024Snappy compression threshold (bytes)
MDDB_COMPRESSION_MEDIUM_THRESHOLD10240Zstd compression threshold (bytes)

Access Modes

  • read: Read-only mode. Write operations will return 403 Forbidden
  • write: Write-only mode (not commonly used)
  • wr: Read and write mode (recommended for most use cases)

Endpoints

POST /v1/add

Add or update a markdown document in a collection.

Request Body:

{ "collection": "blog", "key": "homepage", "lang": "en_GB", "meta": { "category": ["blog", "featured"], "author": ["John Doe"], "tags": ["golang", "database"] }, "contentMd": "# Welcome\n\nThis is the homepage content."
}

Response:

{ "id": "blog|homepage|en_gb", "key": "homepage", "lang": "en_GB", "meta": { "category": ["blog", "featured"], "author": ["John Doe"], "tags": ["golang", "database"] }, "contentMd": "# Welcome\n\nThis is the homepage content.", "addedAt": 1699296000, "updatedAt": 1699296000
}

Features:

  • Creates a new document or updates an existing one
  • Automatically generates a deterministic ID based on collection, key, and lang
  • Maintains revision history
  • Updates metadata indices
  • Tracks addedAt (first creation) and updatedAt (last modification) timestamps

cURL Example:

curl -X POST http://localhost:11023/v1/add \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "key": "homepage", "lang": "en_GB", "meta": { "category": ["blog"] }, "contentMd": "# Welcome to my blog" }'

POST /v1/add-batch

Add multiple documents to a collection in a single request. Uses the optimized batch processor for high throughput. Fires all post-commit hooks (embedding, FTS indexing, webhooks, TTL, automation triggers).

Request Body:

{ "collection": "blog", "documents": [ { "key": "post1", "lang": "en", "contentMd": "# Post 1\n\nFirst post content.", "meta": { "category": ["blog"], "author": ["John Doe"] }, "saveRevision": true }, { "key": "post2", "lang": "en", "contentMd": "# Post 2\n\nSecond post content.", "meta": { "category": ["tutorial"] } } ]
}

Parameters:

  • collection (required): Collection name
  • documents (required): Array of documents to add
    • key (required): Document key
    • lang (required): Language code
    • contentMd (required): Markdown content
    • meta (optional): Metadata key-value pairs
    • saveRevision (optional): Whether to save a revision for this document

Response:

{ "added": 1, "updated": 1, "failed": 0, "errors": []
}

cURL Example:

curl -X POST http://localhost:11023/v1/add-batch \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "documents": [ {"key": "p1", "lang": "en", "contentMd": "# Hello"}, {"key": "p2", "lang": "en", "contentMd": "# World"} ] }'

POST /v1/ingest

Bulk ingest endpoint with advanced features for scraping pipelines and data import workflows. Supports URL key derivation, YAML frontmatter extraction, content deduplication, auto-metadata injection, and collection auto-configuration.

Request Body:

{ "collection": "imported", "documents": [ { "url": "https://example.com/page1", "lang": "en", "contentMd": "# Page 1\n\nContent here.", "scraper": "my-crawler", "scrapedAt": 1709500000, "ttl": 86400 }, { "url": "https://example.com/page2", "lang": "en", "contentMd": "---\ntitle: Page 2\ncategory: tutorial\n---\n# Page 2\n\nMore content.", "extractFrontmatter": true } ], "options": { "skipDuplicates": true, "autoConfigureCollection": true }
}

Parameters:

  • collection (required): Collection name
  • documents (required): Array of documents to ingest
    • url (optional): Source URL β€” used for key derivation and auto-injected as source_url metadata
    • key (optional): Document key β€” if empty, derived from URL
    • lang (required): Language code
    • contentMd (required): Markdown content
    • meta (optional): Metadata key-value pairs
    • extractFrontmatter (optional): Parse YAML frontmatter from content and merge into metadata
    • scrapedAt (optional): Unix timestamp of when the content was scraped β€” auto-injected as scraped_at metadata
    • scraper (optional): Scraper identifier β€” auto-injected as scraper metadata
    • ttl (optional): Time-to-live in seconds
  • options (optional): Ingest options
    • skipDuplicates (optional): Skip documents whose content hasn't changed (CRC32 hash comparison)
    • skipEmbeddings (optional): Skip embedding generation for this batch
    • skipFts (optional): Skip FTS indexing for this batch
    • skipWebhooks (optional): Skip webhook firing for this batch
    • autoConfigureCollection (optional): Auto-configure collection as "scraping" type if it doesn't exist
    • saveRevision (optional): Save revision history for all documents in this batch

Response:

{ "added": 2, "updated": 0, "skipped": 0, "failed": 0, "errors": [], "collection": "imported", "durationMs": 45
}

Features:

  • URL key derivation: If key is empty, a deterministic key is derived from the URL path
  • Frontmatter extraction: When extractFrontmatter is true, YAML frontmatter is parsed from content and merged into metadata (request metadata takes priority over frontmatter)
  • Auto-metadata injection: source_url, scraped_at, and scraper fields are auto-injected into document metadata
  • Content deduplication: With skipDuplicates, existing documents with identical content (CRC32 hash) are skipped
  • Collection auto-configuration: With autoConfigureCollection, the collection is created with type "scraping" if it doesn't exist
  • Selective hook control: Skip embeddings, FTS, or webhooks per batch via options

cURL Example:

curl -X POST http://localhost:11023/v1/ingest \ -H 'Content-Type: application/json' \ -d '{ "collection": "imported", "documents": [ {"url": "https://example.com/page1", "lang": "en", "contentMd": "# Hello", "scraper": "my-crawler"}, {"url": "https://example.com/page2", "lang": "en", "contentMd": "# World", "extractFrontmatter": true} ], "options": {"autoConfigureCollection": true, "skipDuplicates": true} }'

POST /v1/upload

Upload files via multipart/form-data. Files are auto-converted to Markdown and stored as documents. Supports single and batch upload.

Content-Type: multipart/form-data

Form Fields:

  • file or files[] (required): One or more files to upload. Supported formats: .md, .txt, .html, .htm, .pdf, .docx, .odt, .rtf, .yaml, .yml, .log, .lex, .tex, .latex
  • collection (required): Target collection name
  • lang (required): Document language code (e.g. en_US, pl_PL)
  • key (optional): Document key β€” if empty, derived from filename (lowercase, spacesβ†’hyphens, extension stripped)
  • meta (optional): JSON-encoded metadata map, e.g. {"category":["docs"]}
  • ttl (optional): Time-to-live in seconds (0 = no expiry)
  • maxSize (optional): Per-file size limit in bytes (default: 10MB, max: 100MB)

Format Conversion:

FormatExtensionConversion
Markdown.mdStored as-is, frontmatter extracted
Plain text.txtStored as-is, frontmatter extracted
HTML.html, .htmConverted to Markdown (headings, links, lists, bold/italic preserved)
PDF.pdfText extracted (text-based PDFs only; scanned/image PDFs not supported β€” use Docling)
DOCX.docxText extracted with headings and list structure preserved
ODT.odtOpenDocument text extracted with headings preserved
RTF.rtfRich Text Format β€” text extracted, formatting stripped
LaTeX.tex, .latexConverted to Markdown (sections, formatting, environments, math preserved)
YAML.yaml, .ymlWrapped in code block for structured data
Log.logWrapped in code block
LEX.lexWrapped in code block

Auto-injected Metadata:

  • upload_format: Original file format (e.g. pdf, html, docx)
  • upload_filename: Original filename
  • upload_converted: "true" if file was converted from non-markdown format

Single File Response:

{ "key": "report-2026-q1", "format": "pdf", "converted": true, "document": { "id": "doc|docs|report-2026-q1", "key": "report-2026-q1", "lang": "en_US", "meta": { "upload_format": ["pdf"], "upload_filename": ["report-2026-q1.pdf"], "upload_converted": ["true"] }, "contentMd": "# Q1 2026 Report\n\nExtracted text content...", "addedAt": 1710000000, "updatedAt": 1710000000 }
}

Batch Response (multiple files):

{ "added": 3, "updated": 0, "failed": 0, "errors": [], "results": [ {"key": "doc1", "format": "pdf", "converted": true, "document": {...}}, {"key": "doc2", "format": "html", "converted": true, "document": {...}}, {"key": "doc3", "format": "txt", "converted": false, "document": {...}} ]
}

cURL Examples:

curl -X POST http://localhost:11023/v1/upload \ -F "[email protected]" \ -F "collection=docs" \ -F "lang=en_US" curl -X POST http://localhost:11023/v1/upload \ -F "[email protected]" \ -F "collection=docs" \ -F "key=user-manual" \ -F "lang=en_US" \ -F 'meta={"category":["documentation"],"type":["manual"]}' curl -X POST http://localhost:11023/v1/upload \ -F "files[][email protected]" \ -F "files[][email protected]" \ -F "files[][email protected]" \ -F "collection=docs" \ -F "lang=en_US" curl -X POST http://localhost:11023/v1/upload \ -F "[email protected]" \ -F "collection=docs" \ -F "lang=en_US" \ -F "maxSize=52428800"

MCP Tool: upload_file β€” accepts base64-encoded file content with filename for format detection.


POST /v1/get

Retrieve a specific document by collection, key, and language.

Request Body:

{ "collection": "blog", "key": "homepage", "lang": "en_GB", "env": { "year": "2024", "siteName": "My Blog" }
}

Response:

{ "id": "blog|homepage|en_gb", "key": "homepage", "lang": "en_GB", "meta": { "category": ["blog"] }, "contentMd": "# Welcome to My Blog in 2024", "addedAt": 1699296000, "updatedAt": 1699296000
}

Features:

  • Retrieves the latest version of a document
  • Supports templating via env parameter
  • Template variables in content are replaced: %%varName%% β†’ value from env

Template Example:

If your content contains:

And you provide:

{ "env": { "year": "2024", "siteName": "My Blog" }
}

The response will contain:

cURL Example:

curl -X POST http://localhost:11023/v1/get \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "key": "homepage", "lang": "en_GB", "env": {"year": "2024"} }'

POST /v1/search

Search for documents in a collection with optional metadata filtering and sorting.

Request Body:

{ "collection": "blog", "filterMeta": { "category": ["blog", "tutorial"], "author": ["John Doe"] }, "sort": "updatedAt", "asc": false, "limit": 10, "offset": 0
}

Parameters:

  • collection (required): Collection name
  • filterMeta (optional): Metadata filters (AND between keys, OR between values)
  • sort (optional): Sort field - addedAt, updatedAt, or key
  • asc (optional): Sort order - true for ascending, false for descending
  • limit (optional): Maximum number of results (default: 50)
  • offset (optional): Number of results to skip (default: 0)

Response:

[ { "id": "blog|post1|en_gb", "key": "post1", "lang": "en_GB", "meta": { "category": ["blog"], "author": ["John Doe"] }, "contentMd": "# Post 1", "addedAt": 1699296000, "updatedAt": 1699296100 }, { "id": "blog|post2|en_gb", "key": "post2", "lang": "en_GB", "meta": { "category": ["tutorial"], "author": ["John Doe"] }, "contentMd": "# Post 2", "addedAt": 1699295000, "updatedAt": 1699296200 }
]

Filtering Logic:

  • Multiple values for the same key are combined with OR
  • Multiple keys are combined with AND
  • Example: {"category": ["blog", "tutorial"], "author": ["John"]} means:
    • (category = "blog" OR category = "tutorial") AND (author = "John")

cURL Example:

curl -X POST http://localhost:11023/v1/search \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "filterMeta": {"category": ["blog"]}, "sort": "addedAt", "asc": true, "limit": 10 }'

POST /v1/vector-search

Perform semantic (vector) search using natural language queries. Documents are automatically embedded when added (if an embedding provider is configured). The search finds documents by meaning, not just exact metadata matches.

Request Body:

{ "collection": "docs", "query": "how to authenticate users", "topK": 5, "threshold": 0.3, "filterMeta": { "category": ["tutorial"] }, "includeContent": true
}

Parameters:

  • collection (required): Collection name
  • query (required*): Natural language search query (will be embedded server-side)
  • queryVector (optional*): Pre-computed embedding vector (use instead of query)
  • topK (optional): Maximum results to return (default: 5)
  • threshold (optional): Minimum similarity score 0.0-1.0 (default: 0.0)
  • filterMeta (optional): Metadata pre-filter (same logic as /v1/search)
  • includeContent (optional): Include contentMd in results (default: false)

* Either query or queryVector is required.

Response:

{ "results": [ { "document": { "id": "docs|auth-guide|en_us", "key": "auth-guide", "lang": "en_US", "meta": {"category": ["tutorial"]}, "contentMd": "# Authentication Guide\n...", "addedAt": 1709136000, "updatedAt": 1709136000 }, "score": 0.89, "rank": 1 }, { "document": { "id": "docs|login-flow|en_us", "key": "login-flow", "lang": "en_US", "meta": {"category": ["tutorial"]}, "contentMd": "# Login Flow\n...", "addedAt": 1709135000, "updatedAt": 1709135000 }, "score": 0.74, "rank": 2 } ], "total": 2, "model": "text-embedding-3-small", "dimensions": 1536
}

Response Fields:

  • results: Array of matched documents with similarity scores
    • document: Full document object
    • score: Cosine similarity score (0.0-1.0, higher = more similar)
    • rank: Position in results (1-based)
  • total: Number of results returned
  • model: Embedding model used
  • dimensions: Vector dimensionality

How It Works:

  1. When a document is added via /v1/add, its content is automatically embedded in the background
  2. The query text is embedded using the same model
  3. Cosine similarity is computed between the query vector and all document vectors
  4. Results are ranked by similarity score
  5. If filterMeta is provided, only documents matching the metadata filter are searched (hybrid search)

cURL Example:

curl -X POST http://localhost:11023/v1/vector-search \ -H 'Content-Type: application/json' \ -d '{ "collection": "docs", "query": "how to authenticate users", "topK": 5, "includeContent": true }'

POST /v1/vector-reindex

Re-embed all documents in a collection. Useful after changing the embedding provider/model, or for initial indexing of existing documents.

Request Body:

{ "collection": "docs", "force": false
}

Parameters:

  • collection (required): Collection name
  • force (optional): If true, re-embed all documents regardless of content changes. If false, skip documents whose content hasn't changed (default: false)

Response:

{ "embedded": 42, "skipped": 8, "failed": 0, "errors": []
}

cURL Example:

curl -X POST http://localhost:11023/v1/vector-reindex \ -H 'Content-Type: application/json' \ -d '{"collection": "docs", "force": false}'

GET /v1/vector-stats

Get embedding/vector search statistics.

Response:

{ "enabled": true, "provider": "text-embedding-3-small", "model": "text-embedding-3-small", "dimensions": 1536, "index_ready": true, "collections": { "docs": { "total_documents": 50, "embedded_documents": 48 }, "blog": { "total_documents": 120, "embedded_documents": 120 } }
}

cURL Example:

curl http://localhost:11023/v1/vector-stats

Vector Search Configuration

Embedding Providers

ProviderMDDB_EMBEDDING_PROVIDERDefault ModelDefault DimensionsAPI Key Required
OpenAIopenaitext-embedding-3-small1536Yes
Voyage AI (Anthropic)voyagevoyage-31024Yes
Ollama (local)ollamanomic-embed-text768No
Disablednone or empty---

Provider-Specific Configuration

OpenAI:

MDDB_EMBEDDING_PROVIDER=openai
MDDB_EMBEDDING_API_KEY=sk-...
MDDB_EMBEDDING_API_URL=https://api.openai.com/v1 # default
MDDB_EMBEDDING_MODEL=text-embedding-3-small # default
MDDB_EMBEDDING_DIMENSIONS=1536 # default

Voyage AI (Anthropic):

MDDB_EMBEDDING_PROVIDER=voyage
MDDB_EMBEDDING_API_KEY=pa-...
MDDB_EMBEDDING_API_URL=https://api.voyageai.com/v1 # default
MDDB_EMBEDDING_MODEL=voyage-3 # default
MDDB_EMBEDDING_DIMENSIONS=1024 # default

Ollama (local, no API key needed):

MDDB_EMBEDDING_PROVIDER=ollama
MDDB_EMBEDDING_API_URL=http://localhost:11434 # default
MDDB_EMBEDDING_MODEL=nomic-embed-text # default
MDDB_EMBEDDING_DIMENSIONS=768 # default

Performance Benchmarks (Apple M2)

DocumentsDimensionsSearch LatencyThroughput
1,000768~0.9 ms~1,064 qps
1,0001,536~1.8 ms~544 qps
5,000768~4.8 ms~210 qps
10,000768~9.7 ms~104 qps
10,0001,536~19 ms~52 qps
50,000768~50 ms~20 qps
50,0001,536~96 ms~10 qps

Metadata pre-filtering significantly reduces search time (e.g., filtering to 10% of 10K docs: ~1.1 ms vs ~9.7 ms).


POST /v1/fts

Perform full-text search across document content. Supports multiple search modes: simple, boolean, phrase, wildcard, proximity, and range filtering. Uses TF-IDF, BM25, BM25F, or PMISparse scoring with optional stemming, synonyms, and typo tolerance.

Request Body:

{ "collection": "blog", "query": "markdown database tutorial", "limit": 10, "algorithm": "bm25f", "fuzzy": 1, "mode": "auto", "disableStem": false, "disableSynonyms": false, "fieldWeights": { "content": 1.0, "meta.title": 3.0, "meta.tags": 2.0 }, "rangeMeta": [ {"field": "addedAt", "gte": "2024-01-01", "lte": "2024-12-31"} ]
}

Parameters:

  • collection (required): Collection name
  • query (required): Search query text
  • limit (optional): Maximum results (default: 50)
  • algorithm (optional): "tfidf" (default), "bm25", "bm25f", or "pmisparse" β€” used for simple mode
  • mode (optional): Search mode β€” "auto" (default), "simple", "boolean", "phrase", "wildcard", "proximity"
  • distance (optional): Proximity distance in words (default: 5) β€” only used with mode=proximity
  • fuzzy (optional): Typo tolerance β€” 0 (off, default), 1 (1 edit), 2 (2 edits) β€” used for simple mode
  • lang (optional): Language code for query tokenization (e.g., "pl", "de", "fr"). Uses language-specific stemmer and stop words. Falls back to server default if omitted (default: "en", configurable via MDDB_FTS_DEFAULT_LANG)
  • disableStem (optional): Disable stemming for this query (default: false)
  • disableSynonyms (optional): Disable synonym expansion for this query (default: false)
  • fieldWeights (optional, BM25F only): Map of field name to weight. Defaults: content=1.0, meta.title=3.0, meta.tags=2.0, meta.category=2.0, meta.description=1.5
  • filterMeta (optional): Metadata pre-filter β€” {"key": ["value1", "value2"]}
  • rangeMeta (optional): Array of range filters on metadata or timestamps

Search Modes:

  • simple: Standard full-text search with TF-IDF/BM25/BM25F/PMISparse scoring
  • boolean: Boolean operators β€” rust AND performance, rust OR golang, NOT java, +required -excluded
  • phrase: Exact phrase matching β€” "machine learning algorithms" (consecutive terms)
  • wildcard: Pattern matching β€” prog* (any suffix), te?t (single char)
  • proximity: Terms within N words β€” "rust systems" with distance: 5
  • auto: Auto-detects mode from query syntax (default)

Range Filter Object:

  • field (required): Metadata key name, or "addedAt" / "updatedAt" for timestamps
  • gte (optional): Greater than or equal (supports unix timestamps, ISO dates, numeric strings)
  • lte (optional): Less than or equal
  • gt (optional): Greater than (strict)
  • lt (optional): Less than (strict)

Response:

{ "results": [ { "document": { "id": "blog|post1|en_gb", "key": "post1", "lang": "en_GB", "meta": {"category": ["tutorial"]}, "contentMd": "# Markdown Database Tutorial..." }, "score": 2.3456, "matchedTerms": ["markdown", "databas", "tutori"] } ], "total": 1, "algorithm": "bm25", "mode": "simple", "lang": "en", "stemmingActive": true, "synonymsActive": true
}

cURL Examples:

curl -X POST http://localhost:11023/v1/fts \ -H 'Content-Type: application/json' \ -d '{"collection":"blog","query":"markdown database","algorithm":"bm25","limit":10}' curl -X POST http://localhost:11023/v1/fts \ -H 'Content-Type: application/json' \ -d '{"collection":"blog","query":"rust AND performance NOT java","mode":"boolean"}' curl -X POST http://localhost:11023/v1/fts \ -H 'Content-Type: application/json' \ -d '{"collection":"blog","query":"\"machine learning\"","mode":"phrase"}' curl -X POST http://localhost:11023/v1/fts \ -H 'Content-Type: application/json' \ -d '{"collection":"blog","query":"prog*","mode":"wildcard"}' curl -X POST http://localhost:11023/v1/fts \ -H 'Content-Type: application/json' \ -d '{"collection":"blog","query":"rust systems","mode":"proximity","distance":5}' curl -X POST http://localhost:11023/v1/fts \ -H 'Content-Type: application/json' \ -d '{"collection":"shop","query":"widget","rangeMeta":[{"field":"price","gte":"10","lte":"100"}]}' curl -X POST http://localhost:11023/v1/fts \ -H 'Content-Type: application/json' \ -d '{"collection":"articles","query":"programowanie wydajne","lang":"pl","algorithm":"bm25"}'

POST /v1/fts-reindex

Reindex all documents in a collection using their stored lang field for language-aware FTS processing.

Query Parameters:

  • collection (required): Collection name to reindex

cURL Example:

curl -X POST "http://localhost:11023/v1/fts-reindex?collection=articles"

Response:

{ "reindexed": 150, "collection": "articles"
}

GET /v1/fts-languages

Returns all supported languages for multi-language FTS.

cURL Example:

curl http://localhost:11023/v1/fts-languages

Response:

{ "languages": [ {"code": "ar", "name": "Arabic"}, {"code": "da", "name": "Danish"}, {"code": "de", "name": "German"}, {"code": "en", "name": "English"} ], "defaultLang": "en"
}

POST /v1/synonyms

Add or update synonyms for a term in a collection.

Request Body:

{ "collection": "docs", "term": "big", "synonyms": ["large", "huge", "enormous"]
}

Response:

{ "status": "ok"
}

cURL Example:

curl -X POST http://localhost:11023/v1/synonyms \ -H 'Content-Type: application/json' \ -d '{"collection":"docs","term":"big","synonyms":["large","huge","enormous"]}'

GET /v1/synonyms

List all synonyms for a collection.

Query Parameters:

  • collection (required): Collection name

Response:

{ "collection": "docs", "synonyms": { "big": ["large", "huge", "enormous"], "fast": ["quick", "rapid", "swift"] }
}

cURL Example:

curl "http://localhost:11023/v1/synonyms?collection=docs"

DELETE /v1/synonyms

Delete all synonyms for a term in a collection.

Request Body:

{ "collection": "docs", "term": "big"
}

Response:

{ "status": "ok"
}

cURL Example:

curl -X DELETE http://localhost:11023/v1/synonyms \ -H 'Content-Type: application/json' \ -d '{"collection":"docs","term":"big"}'

POST /v1/export

Export documents from a collection in NDJSON or ZIP format.

Request Body:

{ "collection": "blog", "filterMeta": { "category": ["blog"] }, "format": "ndjson"
}

Parameters:

  • collection (required): Collection name
  • filterMeta (optional): Metadata filters (same as search)
  • format (required): Export format - ndjson or zip

Response (NDJSON):

{"id":"blog|post1|en_gb","key":"post1","lang":"en_GB","meta":{"category":["blog"]},"contentMd":"# Post 1","addedAt":1699296000,"updatedAt":1699296100}
{"id":"blog|post2|en_gb","key":"post2","lang":"en_GB","meta":{"category":["blog"]},"contentMd":"# Post 2","addedAt":1699295000,"updatedAt":1699296200}

Response (ZIP): Binary ZIP file containing markdown files named as {key}.{lang}.md

cURL Examples:

NDJSON export:

curl -X POST http://localhost:11023/v1/export \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "filterMeta": {"category": ["blog"]}, "format": "ndjson" }' > export.ndjson

ZIP export:

curl -X POST http://localhost:11023/v1/export \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "format": "zip" }' > export.zip

GET /v1/backup

Create a backup of the database file.

Query Parameters:

  • to (optional): Backup file name (default: backup-{timestamp}.db)

Response:

{ "backup": "backup-1699296000.db"
}

cURL Example:

curl "http://localhost:11023/v1/backup?to=backup-$(date +%s).db"

Notes:

  • Creates a copy of the entire BoltDB database file
  • Backup is created in the same directory as the database
  • Does not interrupt server operations

POST /v1/restore

Restore the database from a backup file.

Request Body:

{ "from": "backup-1699296000.db"
}

Response:

{ "restored": "backup-1699296000.db"
}

cURL Example:

curl -X POST http://localhost:11023/v1/restore \ -H 'Content-Type: application/json' \ -d '{"from": "backup-1699296000.db"}'

⚠️ Warning:

  • This operation replaces the current database
  • The server briefly closes and reopens the database connection
  • All current data will be replaced with the backup

POST /v1/truncate

Truncate revision history and optionally clear cache.

Request Body:

{ "collection": "blog", "keepRevs": 3, "dropCache": true
}

Parameters:

  • collection (required): Collection name
  • keepRevs (required): Number of recent revisions to keep per document (0 = delete all history)
  • dropCache (optional): Whether to drop cache (placeholder for future use)

Response:

{ "status": "truncated"
}

cURL Example:

curl -X POST http://localhost:11023/v1/truncate \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "keepRevs": 3, "dropCache": true }'

Use Cases:

  • Reduce database size by removing old revisions
  • Keep only recent history for auditing
  • Clean up after bulk imports

GET /v1/stats

Get server and database statistics.

Request: No body required (GET request)

Response:

{ "databasePath": "mddb.db", "databaseSize": 16384, "mode": "wr", "collections": [ { "name": "blog", "documentCount": 42, "revisionCount": 156, "metaIndexCount": 84 } ], "totalDocuments": 42, "totalRevisions": 156, "totalMetaIndices": 84, "uptime": ""
}

Response Fields:

  • databasePath: Path to the database file
  • databaseSize: Database file size in bytes
  • mode: Access mode (read, write, wr)
  • collections: Array of collection statistics
    • name: Collection name
    • documentCount: Number of documents in collection
    • revisionCount: Number of revisions in collection
    • metaIndexCount: Number of metadata indices in collection
  • totalDocuments: Total documents across all collections
  • totalRevisions: Total revisions across all collections
  • totalMetaIndices: Total metadata indices across all collections

cURL Example:

curl http://localhost:11023/v1/stats

CLI Example:

mddb-cli stats

Use Cases:

  • Monitor database growth
  • Check collection sizes before operations
  • Verify indexing status
  • Performance monitoring and capacity planning

POST /v1/schema/set

Set or update the validation schema for a collection. Schema validation is opt-in per collection. See the Schema Validation Guide for full details on supported rules.

Request Body:

{ "collection": "blog", "schema": { "required": ["category", "author"], "properties": { "category": { "type": "string", "enum": ["blog", "tutorial", "news"] }, "author": { "type": "string" }, "tags": { "type": "string", "minItems": 1, "maxItems": 5 } } }
}

Response:

{ "status": "ok"
}

cURL Example:

curl -X POST http://localhost:11023/v1/schema/set \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "schema": { "required": ["category"], "properties": { "category": { "type": "string", "enum": ["blog", "tutorial"] } } } }'

POST /v1/schema/get

Retrieve the current validation schema for a collection.

Request Body:

{ "collection": "blog"
}

Response (schema exists):

{ "collection": "blog", "schema": { "required": ["category", "author"], "properties": { "category": { "type": "string", "enum": ["blog", "tutorial", "news"] }, "author": { "type": "string" }, "tags": { "type": "string", "minItems": 1, "maxItems": 5 } } }
}

Response (no schema):

{ "collection": "blog", "schema": null
}

cURL Example:

curl -X POST http://localhost:11023/v1/schema/get \ -H 'Content-Type: application/json' \ -d '{"collection": "blog"}'

POST /v1/schema/delete

Delete the validation schema for a collection, disabling validation. Existing documents are not affected.

Request Body:

{ "collection": "blog"
}

Response:

{ "status": "ok"
}

cURL Example:

curl -X POST http://localhost:11023/v1/schema/delete \ -H 'Content-Type: application/json' \ -d '{"collection": "blog"}'

POST /v1/schema/list

List all collections that have a validation schema defined.

Request Body: Empty or {}.

Response:

{ "schemas": [ { "collection": "blog", "schema": { "required": ["category", "author"], "properties": { "category": { "type": "string", "enum": ["blog", "tutorial", "news"] }, "author": { "type": "string" } } } }, { "collection": "products", "schema": { "required": ["price", "sku"], "properties": { "price": { "type": "number" }, "sku": { "type": "string", "pattern": "^SKU-[0-9]+$" } } } } ]
}

cURL Example:

curl -X POST http://localhost:11023/v1/schema/list \ -H 'Content-Type: application/json' \ -d '{}'

POST /v1/validate

Validate a document's metadata against the collection schema without persisting anything. Useful for dry-run checks.

Request Body:

{ "collection": "blog", "meta": { "category": ["blog"], "author": ["Jane Doe"], "tags": ["golang", "tutorial"] }
}

Response (valid):

{ "valid": true, "errors": []
}

Response (invalid):

{ "valid": false, "errors": [ "value \"pending\" for key \"status\" is not in allowed enum values [draft, published, archived]", "key \"tags\" has 6 values, exceeds maxItems 5" ]
}

cURL Example:

curl -X POST http://localhost:11023/v1/validate \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "meta": { "category": ["blog"], "author": ["Jane Doe"] } }'

POST /v1/auth/login

Authenticate with username and password to receive a JWT token. The token must be included in the Authorization header for subsequent authenticated requests.

Request Body:

{ "username": "admin", "password": "secret"
}

Response:

{ "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...", "expiresAt": 1709481200
}

cURL Example:

curl -X POST http://localhost:11023/v1/auth/login \ -H 'Content-Type: application/json' \ -d '{"username":"admin","password":"secret"}'

Error Responses:

  • 401 Unauthorized - Invalid credentials
  • 400 Bad Request - Invalid request format

POST /v1/auth/api-key

Create a new API key for programmatic access. Requires JWT authentication via Authorization header.

Authentication: JWT token required

Request Body:

{ "description": "CI/CD pipeline", "expiresAt": 0
}

Parameters:

  • description (string, optional): Human-readable label for the API key
  • expiresAt (int64, optional): Unix timestamp when key expires (0 = never expires)

Response:

{ "key": "mddb_live_abc123def456...", "description": "CI/CD pipeline", "createdAt": 1709394600, "expiresAt": 0
}

cURL Example:

TOKEN=$(curl -s -X POST http://localhost:11023/v1/auth/login \ -H 'Content-Type: application/json' \ -d '{"username":"admin","password":"secret"}' | jq -r .token) curl -X POST http://localhost:11023/v1/auth/api-key \ -H "Authorization: Bearer $TOKEN" \ -H 'Content-Type: application/json' \ -d '{"description":"Production deployment","expiresAt":0}'

Important Notes:

  • The full API key is only shown once in the response
  • Save the key securely - it cannot be retrieved again
  • API keys are hashed with SHA256 before storage
  • Use the key in subsequent requests via the X-API-Key header

Error Responses:

  • 401 Unauthorized - Missing or invalid JWT token
  • 400 Bad Request - Invalid request format
  • 500 Internal Server Error - Failed to create API key

GET /v1/auth/api-keys

List all API keys for the authenticated user. Returns metadata about each key (not the actual key values).

Authentication: JWT token required

Response:

{ "keys": [ { "keyHash": "abc123def456...", "description": "Production deployment", "createdAt": 1709394600, "expiresAt": 0 }, { "keyHash": "xyz789ghi012...", "description": "Development testing", "createdAt": 1709395200, "expiresAt": 1740931200 } ]
}

Response Fields:

  • keyHash (string): SHA256 hash of the API key (use this to delete the key)
  • description (string): Key description
  • createdAt (int64): Unix timestamp of creation
  • expiresAt (int64): Unix timestamp of expiry (0 = never expires)

cURL Example:

curl -H "Authorization: Bearer $TOKEN" \ http://localhost:11023/v1/auth/api-keys

Error Responses:

  • 401 Unauthorized - Missing or invalid JWT token
  • 500 Internal Server Error - Failed to retrieve API keys

DELETE /v1/auth/api-keys/:keyHash

Delete an API key by its hash. Users can only delete their own API keys.

Authentication: JWT token required

URL Parameters:

  • keyHash (string, required): The SHA256 hash of the API key (from GET /v1/auth/api-keys)

Response:

{ "status": "deleted"
}

cURL Example:

curl -H "Authorization: Bearer $TOKEN" \ http://localhost:11023/v1/auth/api-keys curl -X DELETE -H "Authorization: Bearer $TOKEN" \ http://localhost:11023/v1/auth/api-keys/abc123def456...

Error Responses:

  • 401 Unauthorized - Missing or invalid JWT token
  • 403 Forbidden - Attempting to delete another user's API key
  • 404 Not Found - API key not found
  • 400 Bad Request - Missing keyHash parameter

Using API Keys

Once you have an API key, use it to authenticate requests instead of JWT tokens:

With HTTP Header:

curl -H "X-API-Key: mddb_live_abc123def456..." \ http://localhost:11023/v1/search \ -H 'Content-Type: application/json' \ -d '{"collection":"blog","filterMeta":{"status":["published"]}}'

With CLI:

mddb-cli --api-key mddb_live_abc123def456... search blog -f "status=published"

API Key vs JWT Token:

  • JWT Tokens: Short-lived (default 24h), obtained via login, ideal for interactive sessions
  • API Keys: Long-lived or permanent, ideal for automation, CI/CD, and third-party integrations

POST /v1/classify

Zero-shot document classification using embedding similarity. Ranks candidate labels by their semantic similarity to a document or text.

Request Body:

FieldTypeRequiredDescription
collectionstringNo*Collection name (for doc reference)
keystringNo*Document key (for doc reference)
langstringNoLanguage code (default: "en")
textstringNo*Raw text to classify
labelsstring[]YesCandidate labels (max 100)
topKintNoReturn top K labels (0 = all)
multiboolNoReturn all labels above threshold
thresholdfloatNoMinimum similarity score (default: 0.0)

*Provide either text OR collection+key (with optional lang).

Example Request:

curl -X POST http://localhost:11023/v1/classify \ -d '{ "text": "Go is a statically typed, compiled language designed at Google", "labels": ["programming", "cooking", "sports", "music"] }'

Example Response:

{ "results": [ {"label": "programming", "score": 0.87}, {"label": "music", "score": 0.21}, {"label": "sports", "score": 0.18}, {"label": "cooking", "score": 0.12} ], "model": "text-embedding-3-small", "dimensions": 1536
}

Notes:

  • Requires an embedding provider to be configured
  • For document references, reuses existing embedding from vector store if available
  • Labels are embedded in a single batch API call for efficiency

PATCH /v1/update

Partially update a document's metadata and/or content independently without re-sending the entire document.

Request Body:

FieldTypeRequiredDescription
collectionstringYesCollection name
keystringYesDocument key
langstringYesLanguage code
metaobjectNoNew metadata (replaces all). Use {} to clear
contentMdstringNoNew content (replaces existing)
ttlintNoNew TTL in seconds (0 = remove)

Example:

curl -X PATCH http://localhost:11023/v1/update \ -d '{"collection":"blog","key":"p1","lang":"en","meta":{"tag":["go","updated"]}}' curl -X PATCH http://localhost:11023/v1/update \ -d '{"collection":"blog","key":"p1","lang":"en","contentMd":"# Updated content"}'

GET /v1/doc-meta

Get document metadata without content. Lightweight read.

Query Parameters:

ParameterRequiredDescription
collectionYesCollection name
keyYesDocument key
langNoLanguage code (default: "en")

Example:

curl "http://localhost:11023/v1/doc-meta?collection=blog&key=p1&lang=en"

POST /v1/delete

Delete a document from a collection.

Request Body:

{ "collection": "blog", "key": "homepage", "lang": "en"
}

Response:

{ "status": "deleted", "collection": "blog", "key": "homepage", "lang": "en"
}

POST /v1/delete-batch

Delete multiple documents in a single request.

Request Body:

{ "collection": "blog", "documents": [ { "key": "post-1", "lang": "en" }, { "key": "post-2", "lang": "en" } ]
}

Response:

{ "deleted": 2, "not_found": 0, "failed": 0, "errors": null
}

POST /v1/delete-collection

Delete all documents in a collection.

Request Body:

{ "collection": "blog"
}

Response:

{ "status": "ok", "collection": "blog"
}

POST /v1/hybrid-search

Hybrid search combining full-text (sparse) and vector (dense) results using alpha blending or reciprocal rank fusion (RRF).

Request Body:

{ "collection": "blog", "query": "how to deploy", "topK": 10, "algorithm": "bm25", "vectorAlgorithm": "flat", "alpha": 0.5, "strategy": "alpha", "rrfK": 60, "fuzzy": 0, "threshold": 0.0, "distanceMetric": "cosine", "filterMeta": { "category": ["tutorial"] }, "includeContent": false, "disableStem": false, "disableSynonyms": false
}
FieldTypeDefaultDescription
collectionstringRequired. Collection name
querystringRequired. Search query
topKinteger10Max results
algorithmstring"bm25"FTS algorithm: bm25, bm25f
vectorAlgorithmstring"flat"Vector algorithm: flat, hnsw, ivf, pq, sq
alphanumber0.5Weight blending (0=FTS only, 1=vector only)
strategystring"alpha"Fusion strategy: alpha or rrf
rrfKinteger60RRF parameter k
fuzzyinteger0Typo tolerance: 0, 1, or 2
thresholdnumber0.0Min vector similarity 0–1
distanceMetricstring"cosine"cosine, dot_product, euclidean
filterMetaobjectMetadata key-value filter
includeContentbooleanfalseInclude full content
disableStembooleanfalseDisable stemming
disableSynonymsbooleanfalseDisable synonym expansion

Response:

{ "results": [ { "document": { "id": "...", "key": "...", "lang": "...", "meta": {} }, "combinedScore": 0.85, "ftsScore": 0.7, "vectorScore": 0.95, "matchedTerms": ["deploy"], "rank": 1 } ], "total": 1, "strategy": "alpha", "alpha": 0.5, "ftsAlgorithm": "bm25", "vectorAlgorithm": "flat", "distanceMetric": "cosine", "searchStats": { "durationMs": 12 }
}

POST /v1/cross-search

Vector search across multiple collections using a text query, pre-computed vector, or another document's embedding.

Request Body:

{ "query": "machine learning basics", "targetCollections": ["articles", "tutorials"], "topK": 10, "threshold": 0.5, "algorithm": "flat", "distanceMetric": "cosine", "includeContent": false
}

Alternative source modes (use one):

  • query (string) β€” text to embed
  • sourceCollection + sourceDocID β€” use an existing document's embedding
  • queryVector (array of numbers) β€” pre-computed vector
FieldTypeDefaultDescription
targetCollectionsstring[]allCollections to search
topKinteger10Max results
thresholdnumber0.0Min similarity
algorithmstring"flat"Vector algorithm
distanceMetricstring"cosine"Distance metric
filterMetaobjectMetadata filter
includeContentbooleanfalseInclude content

Response:

{ "results": [ { "collection": "tutorials", "document": { "key": "ml-intro", "lang": "en", "meta": {} }, "score": 0.92, "rank": 1 } ], "total": 1, "targetCollections": ["articles", "tutorials"], "algorithm": "flat", "distanceMetric": "cosine", "searchStats": { "durationMs": 8, "collectionsSearched": 2 }
}

POST /v1/find-duplicates

Detect exact and similar documents in a collection using content hashing and vector embeddings.

Request Body:

{ "collection": "blog", "mode": "both", "threshold": 0.9, "maxDocs": 5000, "distanceMetric": "cosine", "includeContent": false
}
FieldTypeDefaultDescription
collectionstringRequired. Collection name
modestring"both"exact, similar, or both
thresholdnumber0.9Similarity threshold 0–1
maxDocsinteger5000Max documents to process
distanceMetricstring"cosine"Distance metric
includeContentbooleanfalseInclude document content

Response:

{ "collection": "blog", "mode": "both", "threshold": 0.9, "distanceMetric": "cosine", "totalDocuments": 150, "totalEmbedded": 148, "exactGroups": [ { "groupId": 1, "type": "exact", "documents": [ { "docId": "blog|p1|en", "key": "p1", "contentHash": "abc123" }, { "docId": "blog|p2|en", "key": "p2", "contentHash": "abc123" } ] } ], "similarGroups": [], "exactDuplicates": 2, "similarPairs": 0
}

POST /v1/aggregate

Compute metadata facets and date histograms for a collection. Supports optional metadata pre-filtering.

Request Body:

FieldTypeRequiredDescription
collectionstringYesCollection name
filterMetaobjectNoMetadata pre-filter (same as /v1/search)
facetsarrayNoFacet aggregation requests
facets[].fieldstringYesMetadata key to aggregate (e.g. "category")
facets[].orderBystringNo"count" (default, descending) or "value" (alphabetical)
histogramsarrayNoDate histogram requests
histograms[].fieldstringYes"addedAt" or "updatedAt"
histograms[].intervalstringNo"day", "week", "month" (default), "year"
maxFacetSizeintNoMax values per facet (default: 50)

Example:

curl -X POST http://localhost:11023/v1/aggregate \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "facets": [ {"field": "category"}, {"field": "author", "orderBy": "value"} ], "histograms": [ {"field": "addedAt", "interval": "month"} ] }'

Example with metadata pre-filter:

curl -X POST http://localhost:11023/v1/aggregate \ -H 'Content-Type: application/json' \ -d '{ "collection": "blog", "filterMeta": {"status": ["published"]}, "facets": [{"field": "tags"}] }'

Response:

{ "collection": "blog", "totalDocs": 42, "facets": { "category": [ {"value": "tutorial", "count": 15}, {"value": "news", "count": 12}, {"value": "release", "count": 8} ], "author": [ {"value": "Alice", "count": 20}, {"value": "Bob", "count": 22} ] }, "histograms": { "addedAt": [ {"key": "2026-01", "from": 1767225600, "to": 1769904000, "count": 10}, {"key": "2026-02", "from": 1769904000, "to": 1772323200, "count": 18}, {"key": "2026-03", "from": 1772323200, "to": 1775001600, "count": 14} ] }, "durationMs": 3
}

GET /v1/collection-config

Get configuration for a specific collection.

Query Parameters:

ParameterRequiredDescription
collectionYesCollection name

Example:

curl "http://localhost:11023/v1/collection-config?collection=blog"

Response:

{ "collection": "blog", "config": { "type": "default", "description": "Blog posts", "icon": "", "color": "", "customMeta": {} }, "configured": true
}

Also supports PUT to set config and DELETE to remove config for a collection.

PUT /v1/collection-config

Set or update collection configuration including storage backend.

Request Body:

FieldTypeRequiredDescription
collectionstringYesCollection name
typestringNoCollection type (default, website, images, audio, documents)
descriptionstringNoCollection description
iconstringNoEmoji icon
colorstringNoHex color code
customMetaobjectNoCustom key-value metadata
storageBackendstringNoStorage backend: boltdb (default), memory, s3
storageConfigobjectNoBackend-specific settings (required for s3)

storageConfig fields (for S3):

FieldTypeRequiredDescription
endpointstringYesS3 endpoint (e.g. s3.amazonaws.com, minio:9000)
bucketstringYesS3 bucket name
regionstringNoAWS region (e.g. us-east-1)
accessKeystringNoAccess key
secretKeystringNoSecret key
prefixstringNoKey prefix within bucket (e.g. mddb/)
useTLSboolNoUse HTTPS (default: false)

Example β€” In-Memory backend:

curl -X PUT http://localhost:11023/v1/collection-config \ -H 'Content-Type: application/json' \ -d '{ "collection": "scratch", "type": "default", "storageBackend": "memory" }'

Example β€” S3 backend:

curl -X PUT http://localhost:11023/v1/collection-config \ -H 'Content-Type: application/json' \ -d '{ "collection": "archive", "type": "documents", "storageBackend": "s3", "storageConfig": { "endpoint": "s3.amazonaws.com", "bucket": "my-mddb-archive", "region": "us-east-1", "accessKey": "AKIAIOSFODNN7EXAMPLE", "secretKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY", "prefix": "mddb/", "useTLS": true } }'

Response:

{ "status": "ok", "collection": "archive"
}

Note: The memory backend is ephemeral β€” all data is lost on server restart. The s3 backend requires endpoint and bucket in storageConfig. The default boltdb backend uses the embedded database.


GET /v1/collection-configs

List all collection configurations.

Example:

curl http://localhost:11023/v1/collection-configs

Response:

{ "configs": [ { "collection": "blog", "config": { "type": "default", "description": "Blog posts" } } ], "total": 1
}

GET /v1/embedding-configs

List all configured embedding models.

Example:

curl http://localhost:11023/v1/embedding-configs

Response:

{ "configs": [ { "id": "cfg_abc123", "name": "OpenAI Ada", "provider": "openai", "model": "text-embedding-3-small", "dimensions": 1536, "apiKey": "sk-...", "apiUrl": "", "isDefault": true, "createdAt": 1709100000 } ]
}

Also supports POST to create a new embedding config.


GET/PUT/DELETE /v1/embedding-configs/:id

Manage a specific embedding configuration.

  • GET returns the config object
  • PUT updates the config (fields: name, provider, model, dimensions, apiKey, apiUrl, isDefault)
  • DELETE removes the config (returns 204 No Content)

Example:

curl http://localhost:11023/v1/embedding-configs/cfg_abc123

POST /v1/embedding-configs/set-default

Set a specific embedding configuration as default.

Request Body:

{ "id": "cfg_abc123"
}

Response:

{ "message": "default config updated"
}

GET/POST/DELETE /v1/stopwords

Manage FTS stop words for a collection.

GET β€” List stop words:

curl "http://localhost:11023/v1/stopwords?collection=blog"

Response:

{ "collection": "blog", "entries": [ { "word": "the", "isDefault": true }, { "word": "myword", "isDefault": false } ], "total": 2, "defaults": 1, "custom": 1
}

POST β€” Add stop words:

{ "collection": "blog", "words": ["myword", "another"]
}

DELETE β€” Remove stop words:

{ "collection": "blog", "words": ["myword"]
}

GET/POST /v1/webhooks

List or register webhooks.

GET β€” List all webhooks:

curl http://localhost:11023/v1/webhooks

POST β€” Register a webhook:

{ "url": "https://example.com/hook", "events": ["doc.added", "doc.updated", "doc.deleted"], "collection": "blog"
}

Response (webhook object):

{ "id": "wh_abc123", "url": "https://example.com/hook", "events": ["doc.added", "doc.updated", "doc.deleted"], "collection": "blog", "createdAt": 1709100000
}

POST /v1/webhooks/delete

Delete a webhook by ID.

Request Body:

{ "id": "wh_abc123"
}

Response:

{ "status": "deleted", "id": "wh_abc123"
}

POST /v1/revisions

List document revision history.

Request Body:

{ "collection": "blog", "key": "homepage", "lang": "en"
}

Response:

{ "collection": "blog", "key": "homepage", "lang": "en", "revisions": [ { "timestamp": 1709200000, "updatedAt": 1709200000, "contentMd": "# Old content", "meta": { "author": ["Jane"] } } ], "total": 1
}

POST /v1/revisions/restore

Restore a document to a previous revision.

Request Body:

{ "collection": "blog", "key": "homepage", "lang": "en", "timestamp": 1709200000
}

Response: Restored document object.


GET/POST /v1/automation

List or create automation rules (triggers, crons, webhooks).

GET β€” List rules:

curl http://localhost:11023/v1/automation

Response:

{ "rules": [ { "id": "auto_abc", "name": "Alert on new docs", "type": "trigger", "searchType": "fts", "query": "urgent", "threshold": 0.8, "webhookUrl": "https://example.com/alert" } ], "total": 1
}

POST β€” Create a rule:

{ "name": "Daily report", "type": "cron", "schedule": "0 9 * * *", "searchType": "vector", "query": "status report", "webhookUrl": "https://example.com/report"
}

Response: Created rule object (201 Created).


GET/PUT/DELETE /v1/automation/:id

Manage a specific automation rule.

  • GET returns the rule object
  • PUT updates the rule
  • DELETE removes the rule

POST /v1/automation/:id/test β€” Test a trigger rule:

{ "trigger": { "id": "auto_abc", "name": "...", "searchType": "fts", "query": "urgent" }, "matches": [...], "total": 3
}

GET /v1/automation-logs

Get automation execution logs with pagination.

Query Parameters:

ParameterRequiredDefaultDescription
limitNo50Max results per page
cursorNoPagination cursor
ruleIdNoFilter by rule ID
statusNoFilter by status

Example:

curl "http://localhost:11023/v1/automation-logs?limit=10&ruleId=auto_abc"

Response:

{ "logs": [...], "total": 25, "nextCursor": "...", "hasMore": true
}

POST /v1/import-url

Import a markdown document from a URL. Automatically extracts YAML frontmatter.

Request Body:

{ "collection": "articles", "url": "https://example.com/post.md", "key": "imported-post", "lang": "en", "meta": { "source": ["web"] }, "ttl": 86400
}
FieldTypeDescription
collectionstringRequired. Target collection
urlstringRequired. URL to fetch
keystringDocument key (derived from URL path if empty)
langstringRequired. Language code
metaobjectMetadata (merged with frontmatter)
ttlintegerTime-to-live in seconds

Response: Saved document object.


POST /v1/import-wiki

Import Wikipedia (MediaWiki) XML dumps. Supports .xml and .xml.bz2 compressed files. Streams the XML β€” does not load the entire file into memory.

Multipart Form Upload:

curl -X POST http://localhost:11023/v1/import-wiki \ -F "[email protected]" \ -F "collection=wikipedia" \ -F "lang=en" \ -F "skipRedirects=true" \ -F "skipFts=true"

Raw Stream (octet-stream):

curl -X POST "http://localhost:11023/v1/import-wiki?collection=wikipedia&lang=en&skipRedirects=true&skipFts=true" \ -H "Content-Type: application/x-bzip2" \ --data-binary @enwiki-20260101-pages-articles.xml.bz2
FieldTypeDescription
collectionstringRequired. Target collection
langstringRequired. Language code (e.g. en, de, pl)
namespacesstringComma-separated namespace IDs to import (default: 0 = articles only)
skipRedirectsboolSkip redirect pages (default: false)
skipFtsboolSkip FTS indexing during import for speed (default: false). Run /v1/fts-reindex after.
maxPagesintMaximum pages to import (default: unlimited)
batchSizeintPages per batch commit (default: 500)

Response:

{ "imported": 1234567, "skipped": 456789, "failed": 0, "collection": "wikipedia", "durationMs": 3600000
}

Metadata stored per document:source=wikipedia, wiki_id, wiki_title, wiki_ns, wiki_rev_id, wiki_timestamp, wiki_contributor, wiki_redirect (if applicable).


POST /v1/set-ttl

Set or remove document time-to-live.

Request Body:

{ "collection": "blog", "key": "temp-post", "lang": "en", "ttl": 3600
}
FieldTypeDescription
collectionstringRequired. Collection name
keystringRequired. Document key
langstringRequired. Language code
ttlintegerRequired. Seconds until expiry; 0 to remove TTL

Response: Updated document object with expiresAt field.


GET /v1/meta-keys

List all unique metadata keys and their distinct values for a collection.

Query Parameters:

ParameterRequiredDescription
collectionYesCollection name

Example:

curl "http://localhost:11023/v1/meta-keys?collection=blog"

Response:

{ "meta": { "author": ["John", "Jane"], "category": ["blog", "tutorial"], "tags": ["golang", "database"] }
}

GET /v1/checksum

Get CRC32 checksum of a collection for integrity verification.

Query Parameters:

ParameterRequiredDescription
collectionYesCollection name

Example:

curl "http://localhost:11023/v1/checksum?collection=blog"

Response:

{ "collection": "blog", "checksum": "a1b2c3d4", "documentCount": 42
}

GET /v1/system/info

Returns system information including OS, memory, CPU, and network details.

Example:

curl http://localhost:11023/v1/system/info

Response:

{ "hostname": "server-1", "os": "linux", "arch": "amd64", "numCPU": 4, "goVersion": "go1.26.2", "version": "2.9.11", "uptimeSeconds": 3600, "memoryTotal": 134217728, "memoryUsed": 67108864, "numGoroutines": 12, "cpuUsagePercent": 15.3
}

GET /v1/config

Returns server configuration overview.

Example:

curl http://localhost:11023/v1/config

Response:

{ "version": "2.9.11", "databasePath": "mddb.db", "mode": "wr", "protocols": { "http": { "enabled": true, "addr": ":11023" }, "grpc": { "enabled": true, "addr": ":11024" }, "mcp": { "enabled": true, "addr": ":11025" } }, "authEnabled": false, "metricsEnabled": true, "vectorConfig": { "enabled": true, "provider": "openai", "model": "text-embedding-3-small", "dimensions": 1536 }, "automationsEnabled": true, "searchStatsEnabled": true
}

GET /v1/endpoints

Returns list of all available endpoints across HTTP, gRPC, and MCP protocols.

Example:

curl http://localhost:11023/v1/endpoints

Response:

{ "http": [ { "method": "POST", "path": "/v1/add", "description": "Add document", "requiresAuth": true } ], "grpc": [ { "name": "AddDocument", "description": "Add a document" } ], "mcp": [ { "name": "add_document", "description": "Add a document" } ]
}

GET /health

Health check endpoint. Also available at /v1/health.

Response:

{ "status": "healthy", "mode": "wr"
}

Returns 503 with "status": "unhealthy" if the database is not accessible.


POST /v1/auth/register

Register a new user. Requires admin privileges.

Request Body:

{ "username": "newuser", "password": "secret123"
}

Response:

{ "username": "newuser", "createdAt": 1709100000
}

GET /v1/auth/me

Get current authenticated user information.

Response:

{ "username": "admin", "admin": true, "createdAt": 1709000000
}

GET/POST /v1/auth/permissions

Get or set user permissions on collections.

GET β€” Query parameter username:

curl "http://localhost:11023/v1/auth/permissions?username=john"

POST β€” Set permission:

{ "username": "john", "collection": "blog", "read": true, "write": true, "admin": false
}

Response (POST): { "status": "ok" }


GET /v1/auth/users

List all users. Requires admin privileges.

Response:

{ "users": [ { "username": "admin", "createdAt": 1709000000, "disabled": false, "admin": true, "groups": ["admins"] } ]
}

DELETE /v1/auth/users/:username

Delete a user account. Requires admin privileges.

Example:

curl -X DELETE http://localhost:11023/v1/auth/users/john

Response: { "status": "deleted" }


GET/POST /v1/auth/groups

List all groups (GET) or create a new group (POST). Requires admin privileges.

POST β€” Create group:

{ "name": "editors", "description": "Content editors", "members": ["john", "jane"]
}

Response (POST): Created group object (201 Created).


GET/PUT/DELETE /v1/auth/groups/:name

Manage a specific group. Requires admin privileges.

  • GET returns the group object
  • PUT updates description and members
  • DELETE removes the group

GET/POST /v1/auth/group-permissions

Get or set permissions for a group on collections.

GET β€” Query parameter group:

curl "http://localhost:11023/v1/auth/group-permissions?group=editors"

POST β€” Set group permission:

{ "group": "editors", "collection": "blog", "read": true, "write": true, "admin": false
}

Response (POST): { "status": "permission set" }


Data Models

Document

{ "id": string, // Auto-generated: "collection|key|lang" "key": string, // Document key (e.g., "homepage") "lang": string, // Language code (e.g., "en_GB") "meta": { // Metadata (multi-value) "key1": ["value1", "value2"], "key2": ["value3"] }, "contentMd": string, // Markdown content "addedAt": int64, // Unix timestamp (first creation) "updatedAt": int64 // Unix timestamp (last update)
}

Metadata

  • Metadata is stored as map[string][]string (key β†’ array of values)
  • Each metadata key can have multiple values
  • Metadata is automatically indexed for fast searching
  • Common metadata keys: category, author, tags, status, etc.

Error Handling

Error Response Format

{ "error": "error message description"
}

HTTP Status Codes

CodeDescription
200Success
400Bad Request - Invalid JSON or missing required fields
403Forbidden - Write operation in read-only mode
404Not Found - Document doesn't exist
500Internal Server Error

Common Errors

Missing required fields:

{ "error": "missing fields"
}

Document not found:

{ "error": "not found"
}

Read-only mode:

{ "error": "read-only mode"
}

Best Practices

1. Document Keys

  • Use descriptive, URL-friendly keys
  • Keep keys consistent within a collection
  • Example: homepage, about-us, blog-post-1

2. Language Codes

  • Use standard language codes (ISO 639-1 + ISO 3166-1)
  • Examples: en_US, en_GB, pl_PL, de_DE

3. Metadata

  • Keep metadata keys consistent across documents
  • Use arrays even for single values (for consistency)
  • Index frequently queried fields

4. Collections

  • Group related documents in collections
  • Use collections like database tables
  • Examples: blog, pages, products, docs

5. Revisions

  • Regularly truncate old revisions to save space
  • Keep enough history for your audit requirements
  • Consider keeping 5-10 recent revisions

6. Backups

  • Schedule regular backups
  • Store backups in a different location
  • Test restore procedures periodically

Performance Tips

  1. Indexing: Metadata is automatically indexed - use it for filtering
  2. Pagination: Always use limit and offset for large result sets
  3. Batch Operations: Use export/import for bulk operations
  4. Revisions: Truncate old revisions regularly to keep database size manageable
  5. Read Mode: Use read-only mode for read-heavy workloads with separate write instances

Memory RAG Endpoints

Conversational memory system for RAG applications. Store, search, and recall conversation history with semantic search.

POST /v1/memory/session

Create a new memory/conversation session.

Request:

{ "userId": "user-1", "scenario": "customer_support", "title": "Session about search API", "meta": {"department": "engineering"}, "ttl": 2592000
}
FieldTypeRequiredDescription
userIdstringYesUser identifier
scenariostringNoSession context/scenario
titlestringNoHuman-readable title (auto-generated if empty)
metaobjectNoAdditional metadata key-value pairs
ttlintNoTTL in seconds (default: 30 days)

Response:

{ "sessionId": "a1b2c3d4e5f6...", "userId": "user-1", "scenario": "customer_support", "title": "Session about search API", "createdAt": 1743400000, "expiresAt": 1745992000
}

POST /v1/memory/message

Add a message to an existing session. Messages are automatically embedded for semantic recall.

Request:

{ "sessionId": "a1b2c3d4e5f6...", "role": "user", "content": "How does vector search work?", "meta": {"topic": "search", "source": "docs"}
}
FieldTypeRequiredDescription
sessionIdstringYesSession ID from /v1/memory/session
rolestringYesuser, assistant, system, or tool
contentstringYesMessage content (markdown supported)
metaobjectNoExtra metadata (topic, source, tool_call, etc.)

Response:

{ "messageId": "memory_messages|...", "sessionId": "a1b2c3d4e5f6...", "role": "user", "createdAt": 1743400100, "embedded": true
}

POST /v1/memory/recall

Semantically recall relevant messages from past conversations using hybrid search (vector + keyword).

Request:

{ "query": "How does vector search work?", "userId": "user-1", "sessionId": "", "role": "assistant", "topK": 10, "threshold": 0.5, "strategy": "hybrid", "alpha": 0.5, "includeContent": true, "filterMeta": {}
}
FieldTypeRequiredDescription
querystringYesNatural language recall query
userIdstringNoFilter to sessions belonging to this user
sessionIdstringNoFilter to a specific session
rolestringNoFilter by message role
topKintNoNumber of results (default: 10)
thresholdfloatNoMin similarity score 0-1
strategystringNohybrid (default), semantic, keyword
alphafloatNoWeight 0-1 (0=keyword, 1=semantic)
includeContentboolNoInclude full message content
filterMetaobjectNoAdditional metadata filters

Response:

{ "results": [ { "document": {"id": "...", "key": "...", "meta": {...}, "contentMd": "..."}, "score": 0.87, "rank": 1, "sessionId": "a1b2c3d4e5f6...", "role": "assistant", "matchStrategy": "hybrid" } ], "total": 5, "strategy": "hybrid", "query": "How does vector search work?"
}

POST /v1/memory/summarize

Generate and store a summary of a session's conversation.

Request:

{ "sessionId": "a1b2c3d4e5f6...", "userId": "user-1"
}

Response:

{ "summaryId": "memory_summaries|...", "sessionId": "a1b2c3d4e5f6...", "summary": "# Session Summary: a1b2c3d4\n\nMessages: 5\n\n## Conversation\n\n...", "createdAt": 1743401000, "messages": 5
}

POST /v1/memory/sessions

List memory sessions with optional filtering.

Request:

{ "userId": "user-1", "scenario": "customer_support", "limit": 50, "offset": 0, "sort": "createdAt", "asc": false
}

Response:

{ "sessions": [ { "sessionId": "a1b2c3d4e5f6...", "userId": "user-1", "scenario": "customer_support", "title": "Session about search API", "createdAt": 1743400000, "updatedAt": 1743401000, "expiresAt": 1745992000, "messageCount": 12 } ], "total": 3
}

POST /v1/memory/history

Get the full message history for a session, ordered chronologically.

Request:

{ "sessionId": "a1b2c3d4e5f6...", "limit": 100, "offset": 0
}

Response:

{ "messages": [ {"id": "...", "key": "...", "meta": {"role": ["user"], "sessionId": ["..."]}, "contentMd": "How does vector search work?", "addedAt": 1743400100}, {"id": "...", "key": "...", "meta": {"role": ["assistant"], "sessionId": ["..."]}, "contentMd": "Vector search uses embeddings...", "addedAt": 1743400110} ], "total": 2
}