MDDB Telemetry & Monitoring

MDDB exposes a Prometheus-compatible /metrics endpoint that provides real-time telemetry about the server, database, and Go runtime. No external dependencies are required -- metrics are rendered in the standard Prometheus text exposition format and can be scraped by any compatible tool.

Quick Start

Metrics are enabled by default. Verify by hitting the endpoint:

curl http://localhost:11023/metrics

Output (excerpt):

mddb_uptime_seconds 3621.4 mddb_http_requests_total{method="POST",path="/v1/add",status="200"} 1523
mddb_http_requests_total{method="POST",path="/v1/search",status="200"} 892 mddb_documents_total{collection="blog"} 150
mddb_documents_total{collection="docs"} 42

Configuration

Environment VariableDefaultDescription
MDDB_METRICStrueSet to false to disable the /metrics endpoint
MDDB_METRICS=false ./mddbd MDDB_METRICS=true ./mddbd

When disabled, GET /metrics returns 404 and no request tracking middleware is active (zero overhead).

Available Metrics

Server Info

MetricTypeLabelsDescription
mddb_infogaugemodeServer metadata (always 1)
mddb_uptime_secondsgauge--Seconds since server start

HTTP Requests

MetricTypeLabelsDescription
mddb_http_requests_totalcountermethod, path, statusTotal HTTP requests
mddb_http_request_duration_secondshistogrammethod, pathRequest latency distribution

Histogram buckets: 1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s.

Database

MetricTypeLabelsDescription
mddb_database_size_bytesgauge--Database file size
mddb_documents_totalgaugecollectionDocuments per collection
mddb_revisions_totalgaugecollectionRevisions per collection
mddb_meta_indices_totalgaugecollectionMetadata index entries per collection

Database metrics are cached for 15 seconds to avoid excessive BoltDB scans.

Vector Search

MetricTypeLabelsDescription
mddb_vector_embeddings_totalgaugecollectionEmbedded documents per collection
mddb_vector_index_readygauge--1 if index is loaded, 0 if loading
mddb_embedding_provider_configuredgauge--1 if embedding provider is set
mddb_embedding_queue_sizegauge--Pending embedding jobs in queue

Webhooks & Schema

MetricTypeLabelsDescription
mddb_webhooks_totalgauge--Registered webhooks
mddb_schemas_totalgauge--Registered metadata schemas

Go Runtime

MetricTypeDescription
go_goroutinesgaugeActive goroutines
go_memstats_alloc_bytesgaugeAllocated heap memory
go_memstats_sys_bytesgaugeTotal memory from OS
go_memstats_heap_inuse_bytesgaugeIn-use heap spans
go_gc_completed_totalcounterCompleted GC cycles
go_gc_pause_seconds_totalcounterTotal GC pause time

Prometheus Setup

prometheus.yml

global: scrape_interval: 15s scrape_configs: - job_name: 'mddb' static_configs: - targets: ['localhost:11023'] metrics_path: /metrics scrape_interval: 15s

Docker Compose

services: mddb: image: ghcr.io/tradik/mddbd:latest ports: - "11023:11023" prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin

Grafana Dashboard

Useful PromQL Queries

Request rate (requests/second):

rate(mddb_http_requests_total[5m])

Request rate per endpoint:

sum by (path) (rate(mddb_http_requests_total[5m]))

Error rate (4xx + 5xx):

sum(rate(mddb_http_requests_total{status=~"4..|5.."}[5m])) /
sum(rate(mddb_http_requests_total[5m]))

P99 latency per endpoint:

histogram_quantile(0.99, sum by (path, le) (rate(mddb_http_request_duration_seconds_bucket[5m]))
)

P50 (median) latency:

histogram_quantile(0.50, sum by (le) (rate(mddb_http_request_duration_seconds_bucket[5m]))
)

Database size over time:

mddb_database_size_bytes

Documents per collection:

mddb_documents_total

Embedding coverage (% of documents embedded):

mddb_vector_embeddings_total / mddb_documents_total * 100

Embedding queue backlog:

mddb_embedding_queue_size

Memory usage:

go_memstats_alloc_bytes / 1024 / 1024 # in MB

Goroutine count:

go_goroutines

Alerting Rules

groups: - name: mddb rules: # High error rate - alert: MDDBHighErrorRate expr: > sum(rate(mddb_http_requests_total{status=~"5.."}[5m])) / sum(rate(mddb_http_requests_total[5m])) > 0.05 for: 5m labels: severity: warning annotations: summary: "MDDB error rate > 5%" # Slow requests (P99 > 2s) - alert: MDDBSlowRequests expr: > histogram_quantile(0.99, sum by (le) (rate(mddb_http_request_duration_seconds_bucket[5m])) ) > 2 for: 5m labels: severity: warning annotations: summary: "MDDB P99 latency > 2s" # Vector index not ready - alert: MDDBVectorIndexNotReady expr: mddb_vector_index_ready == 0 for: 5m labels: severity: critical annotations: summary: "MDDB vector index not ready for 5+ minutes" # Embedding queue backlog - alert: MDDBEmbeddingQueueHigh expr: mddb_embedding_queue_size > 500 for: 10m labels: severity: warning annotations: summary: "MDDB embedding queue backlog > 500 jobs" # Database size > 5GB - alert: MDDBDatabaseLarge expr: mddb_database_size_bytes > 5e9 for: 1h labels: severity: info annotations: summary: "MDDB database size exceeds 5GB" # Server down - alert: MDDBDown expr: up{job="mddb"} == 0 for: 1m labels: severity: critical annotations: summary: "MDDB server is down"

Alternative Scrapers

Grafana Agent

metrics: configs: - name: mddb scrape_configs: - job_name: mddb static_configs: - targets: ['localhost:11023'] remote_write: - url: https://prometheus-us-central1.grafana.net/api/prom/push

Datadog Agent

instances: - openmetrics_endpoint: http://localhost:11023/metrics namespace: mddb metrics: - mddb_* - go_*

Victoria Metrics

victoria-metrics -promscrape.config=prometheus.yml

cURL + jq (manual check)

watch -n 5 'curl -s http://localhost:11023/metrics | grep -E "^mddb_(uptime|documents|database_size|http_requests)"'

Kubernetes

ServiceMonitor (Prometheus Operator)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata: name: mddb labels: release: prometheus
spec: selector: matchLabels: app: mddb endpoints: - port: http path: /metrics interval: 15s

Pod annotations (auto-discovery)

apiVersion: v1
kind: Pod
metadata: annotations: prometheus.io/scrape: "true" prometheus.io/port: "11023" prometheus.io/path: "/metrics"

Existing Monitoring Endpoints

In addition to /metrics, MDDB provides JSON endpoints useful for monitoring:

EndpointDescription
GET /healthHealth check (200 = healthy, 503 = unhealthy)
GET /v1/statsDetailed database statistics (JSON)
GET /v1/vector-statsVector search subsystem status (JSON)

These can be used with simpler monitoring tools that don't support Prometheus format, or for custom health check scripts:

livenessProbe: httpGet: path: /health port: 11023 initialDelaySeconds: 5 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 11023 initialDelaySeconds: 10 periodSeconds: 5

Architecture Notes

  • Zero dependencies: Metrics are rendered using Go's fmt and strings packages. No Prometheus client library required.
  • Low overhead: Request counting uses a mutex-protected map. The /metrics handler is only called during scrapes (typically every 15s).
  • DB stats caching: Database-level metrics (document counts, file size) are cached for 15 seconds to avoid repeated BoltDB cursor scans.
  • Histogram buckets: Fixed buckets from 1ms to 10s covering typical API latency ranges. Cumulative format compatible with histogram_quantile().
  • Path normalization: All /v1/* paths and /health are preserved as-is. Unknown paths collapse to /other to prevent label cardinality explosion.