MDDB Telemetry & Monitoring
MDDB exposes a Prometheus-compatible /metrics endpoint that provides real-time telemetry about the server, database, and Go runtime. No external dependencies are required -- metrics are rendered in the standard Prometheus text exposition format and can be scraped by any compatible tool.
Quick Start
Metrics are enabled by default. Verify by hitting the endpoint:
curl http://localhost:11023/metrics
Output (excerpt):
mddb_uptime_seconds 3621.4 mddb_http_requests_total{method="POST",path="/v1/add",status="200"} 1523
mddb_http_requests_total{method="POST",path="/v1/search",status="200"} 892 mddb_documents_total{collection="blog"} 150
mddb_documents_total{collection="docs"} 42
Configuration
| Environment Variable | Default | Description |
|---|---|---|
MDDB_METRICS | true | Set to false to disable the /metrics endpoint |
MDDB_METRICS=false ./mddbd MDDB_METRICS=true ./mddbd
When disabled, GET /metrics returns 404 and no request tracking middleware is active (zero overhead).
Available Metrics
Server Info
| Metric | Type | Labels | Description |
|---|---|---|---|
mddb_info | gauge | mode | Server metadata (always 1) |
mddb_uptime_seconds | gauge | -- | Seconds since server start |
HTTP Requests
| Metric | Type | Labels | Description |
|---|---|---|---|
mddb_http_requests_total | counter | method, path, status | Total HTTP requests |
mddb_http_request_duration_seconds | histogram | method, path | Request latency distribution |
Histogram buckets: 1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s.
Database
| Metric | Type | Labels | Description |
|---|---|---|---|
mddb_database_size_bytes | gauge | -- | Database file size |
mddb_documents_total | gauge | collection | Documents per collection |
mddb_revisions_total | gauge | collection | Revisions per collection |
mddb_meta_indices_total | gauge | collection | Metadata index entries per collection |
Database metrics are cached for 15 seconds to avoid excessive BoltDB scans.
Vector Search
| Metric | Type | Labels | Description |
|---|---|---|---|
mddb_vector_embeddings_total | gauge | collection | Embedded documents per collection |
mddb_vector_index_ready | gauge | -- | 1 if index is loaded, 0 if loading |
mddb_embedding_provider_configured | gauge | -- | 1 if embedding provider is set |
mddb_embedding_queue_size | gauge | -- | Pending embedding jobs in queue |
Webhooks & Schema
| Metric | Type | Labels | Description |
|---|---|---|---|
mddb_webhooks_total | gauge | -- | Registered webhooks |
mddb_schemas_total | gauge | -- | Registered metadata schemas |
Go Runtime
| Metric | Type | Description |
|---|---|---|
go_goroutines | gauge | Active goroutines |
go_memstats_alloc_bytes | gauge | Allocated heap memory |
go_memstats_sys_bytes | gauge | Total memory from OS |
go_memstats_heap_inuse_bytes | gauge | In-use heap spans |
go_gc_completed_total | counter | Completed GC cycles |
go_gc_pause_seconds_total | counter | Total GC pause time |
Prometheus Setup
prometheus.yml
global: scrape_interval: 15s scrape_configs: - job_name: 'mddb' static_configs: - targets: ['localhost:11023'] metrics_path: /metrics scrape_interval: 15s
Docker Compose
services: mddb: image: ghcr.io/tradik/mddbd:latest ports: - "11023:11023" prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: - GF_SECURITY_ADMIN_PASSWORD=admin
Grafana Dashboard
Useful PromQL Queries
Request rate (requests/second):
rate(mddb_http_requests_total[5m])
Request rate per endpoint:
sum by (path) (rate(mddb_http_requests_total[5m]))
Error rate (4xx + 5xx):
sum(rate(mddb_http_requests_total{status=~"4..|5.."}[5m])) /
sum(rate(mddb_http_requests_total[5m]))
P99 latency per endpoint:
histogram_quantile(0.99, sum by (path, le) (rate(mddb_http_request_duration_seconds_bucket[5m]))
)
P50 (median) latency:
histogram_quantile(0.50, sum by (le) (rate(mddb_http_request_duration_seconds_bucket[5m]))
)
Database size over time:
mddb_database_size_bytes
Documents per collection:
mddb_documents_total
Embedding coverage (% of documents embedded):
mddb_vector_embeddings_total / mddb_documents_total * 100
Embedding queue backlog:
mddb_embedding_queue_size
Memory usage:
go_memstats_alloc_bytes / 1024 / 1024 # in MB
Goroutine count:
go_goroutines
Alerting Rules
groups: - name: mddb rules: # High error rate - alert: MDDBHighErrorRate expr: > sum(rate(mddb_http_requests_total{status=~"5.."}[5m])) / sum(rate(mddb_http_requests_total[5m])) > 0.05 for: 5m labels: severity: warning annotations: summary: "MDDB error rate > 5%" # Slow requests (P99 > 2s) - alert: MDDBSlowRequests expr: > histogram_quantile(0.99, sum by (le) (rate(mddb_http_request_duration_seconds_bucket[5m])) ) > 2 for: 5m labels: severity: warning annotations: summary: "MDDB P99 latency > 2s" # Vector index not ready - alert: MDDBVectorIndexNotReady expr: mddb_vector_index_ready == 0 for: 5m labels: severity: critical annotations: summary: "MDDB vector index not ready for 5+ minutes" # Embedding queue backlog - alert: MDDBEmbeddingQueueHigh expr: mddb_embedding_queue_size > 500 for: 10m labels: severity: warning annotations: summary: "MDDB embedding queue backlog > 500 jobs" # Database size > 5GB - alert: MDDBDatabaseLarge expr: mddb_database_size_bytes > 5e9 for: 1h labels: severity: info annotations: summary: "MDDB database size exceeds 5GB" # Server down - alert: MDDBDown expr: up{job="mddb"} == 0 for: 1m labels: severity: critical annotations: summary: "MDDB server is down"
Alternative Scrapers
Grafana Agent
metrics: configs: - name: mddb scrape_configs: - job_name: mddb static_configs: - targets: ['localhost:11023'] remote_write: - url: https://prometheus-us-central1.grafana.net/api/prom/push
Datadog Agent
instances: - openmetrics_endpoint: http://localhost:11023/metrics namespace: mddb metrics: - mddb_* - go_*
Victoria Metrics
victoria-metrics -promscrape.config=prometheus.yml
cURL + jq (manual check)
watch -n 5 'curl -s http://localhost:11023/metrics | grep -E "^mddb_(uptime|documents|database_size|http_requests)"'
Kubernetes
ServiceMonitor (Prometheus Operator)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata: name: mddb labels: release: prometheus
spec: selector: matchLabels: app: mddb endpoints: - port: http path: /metrics interval: 15s
Pod annotations (auto-discovery)
apiVersion: v1
kind: Pod
metadata: annotations: prometheus.io/scrape: "true" prometheus.io/port: "11023" prometheus.io/path: "/metrics"
Existing Monitoring Endpoints
In addition to /metrics, MDDB provides JSON endpoints useful for monitoring:
| Endpoint | Description |
|---|---|
GET /health | Health check (200 = healthy, 503 = unhealthy) |
GET /v1/stats | Detailed database statistics (JSON) |
GET /v1/vector-stats | Vector search subsystem status (JSON) |
These can be used with simpler monitoring tools that don't support Prometheus format, or for custom health check scripts:
livenessProbe: httpGet: path: /health port: 11023 initialDelaySeconds: 5 periodSeconds: 10 readinessProbe: httpGet: path: /health port: 11023 initialDelaySeconds: 10 periodSeconds: 5
Architecture Notes
- Zero dependencies: Metrics are rendered using Go's
fmtandstringspackages. No Prometheus client library required. - Low overhead: Request counting uses a mutex-protected map. The
/metricshandler is only called during scrapes (typically every 15s). - DB stats caching: Database-level metrics (document counts, file size) are cached for 15 seconds to avoid repeated BoltDB cursor scans.
- Histogram buckets: Fixed buckets from 1ms to 10s covering typical API latency ranges. Cumulative format compatible with
histogram_quantile(). - Path normalization: All
/v1/*paths and/healthare preserved as-is. Unknown paths collapse to/otherto prevent label cardinality explosion.