Embedding Providers Guide

MDDB supports multiple embedding providers for vector search functionality. You can configure embeddings either through environment variables or via the Admin Panel UI.

Supported Providers

1. OpenAI

API URL:https://api.openai.com/v1Authentication: API Key required Documentation: https://platform.openai.com/docs/guides/embeddings

Popular Models

ModelDimensionsUse CaseCost
text-embedding-3-small1536Fast, cost-effective, general purpose$
text-embedding-3-large3072Highest quality, best performance$$$
text-embedding-ada-0021536Legacy model (v2)$$

Environment Variables

export MDDB_EMBEDDING_PROVIDER=openai
export MDDB_EMBEDDING_API_KEY=sk-...
export MDDB_EMBEDDING_MODEL=text-embedding-3-small
export MDDB_EMBEDDING_DIMENSIONS=1536

2. Cohere

API URL:https://api.cohere.ai/v1Authentication: API Key required Documentation: https://docs.cohere.com/docs/embeddings

Popular Models

ModelDimensionsUse CaseLanguages
embed-english-v3.01024English textEnglish only
embed-multilingual-v3.01024Multilingual support100+ languages
embed-english-light-v3.0384Fast, smaller embeddingsEnglish only
embed-multilingual-light-v3.0384Fast, smaller, multilingual100+ languages

Features

  • โœ… Best multilingual support
  • โœ… Semantic search optimized
  • โœ… Built-in compression options

Environment Variables

export MDDB_EMBEDDING_PROVIDER=cohere
export MDDB_EMBEDDING_API_KEY=cohere_api_key...
export MDDB_EMBEDDING_MODEL=embed-english-v3.0
export MDDB_EMBEDDING_DIMENSIONS=1024

3. Voyage AI

API URL:https://api.voyageai.com/v1Authentication: API Key required Documentation: https://docs.voyageai.com/

Popular Models

ModelDimensionsUse CaseSpecialty
voyage-31024Latest, best qualityGeneral purpose
voyage-large-21536High accuracyLong documents
voyage-code-21536Code embeddingsProgramming code
voyage-law-21024Legal documentsLegal text

Features

  • โœ… Specialized in embeddings (not general LLM)
  • โœ… Very high quality
  • โœ… Domain-specific models (code, law)
  • โœ… Competitive pricing

Environment Variables

export MDDB_EMBEDDING_PROVIDER=voyage
export MDDB_EMBEDDING_API_KEY=pa-...
export MDDB_EMBEDDING_MODEL=voyage-3
export MDDB_EMBEDDING_DIMENSIONS=1024

4. Ollama (Local)

API URL:http://localhost:11434 (default) Authentication: None (local server) Documentation: https://ollama.ai/

Popular Models

ModelDimensionsSizeQuality
nomic-embed-text768~275MBGood, fast
mxbai-embed-large1024~670MBBetter quality
all-minilm384~45MBSmall, very fast
snowflake-arctic-embed1024~669MBHigh quality

Features

  • โœ… Fully local, no API costs
  • โœ… Privacy-focused
  • โœ… Works offline
  • โœ… Multiple open-source models

Setup

  1. Install Ollama: https://ollama.ai/download
  2. Pull model: ollama pull nomic-embed-text
  3. Run server: ollama serve (usually auto-starts)

Environment Variables

export MDDB_EMBEDDING_PROVIDER=ollama
export MDDB_EMBEDDING_API_URL=http://localhost:11434
export MDDB_EMBEDDING_MODEL=nomic-embed-text
export MDDB_EMBEDDING_DIMENSIONS=768

Configuration Methods

Method 1: Environment Variables (Legacy)

Set environment variables before starting mddbd:

export MDDB_EMBEDDING_PROVIDER=openai
export MDDB_EMBEDDING_API_KEY=sk-...
export MDDB_EMBEDDING_MODEL=text-embedding-3-small
export MDDB_EMBEDDING_DIMENSIONS=1536 ./mddbd

Method 2: Admin Panel (Recommended)

  1. Open mddb-panel: http://localhost:11024

  2. Navigate to Administration โ†’ Embedding Models

  3. Click Add Model

  4. Fill in configuration:

    • ID: Unique identifier (e.g., openai-small, cohere-multilingual)
    • Name: Display name (e.g., OpenAI Small, Cohere Multilingual)
    • Provider: Select from dropdown
    • Model: Model name
    • Dimensions: Vector dimensions
    • API Key: Your API key (for OpenAI, Cohere, Voyage)
    • API URL: Custom URL or leave empty for default
    • Set as default: Check to make this the active model
  5. Click Create

Import Current Config

If you're using environment variables and want to migrate to database config:

  1. Open Administration โ†’ Embedding Models
  2. If no configs exist, you'll see "Import Current Configuration"
  3. Click Import Current Config to save your env var config to the database

Comparison Matrix

ProviderCostQualitySpeedMultilingualLocalAPI Key Required
OpenAI$$ExcellentFastGoodNoYes
Cohere$$ExcellentFastBestNoYes
Voyage$$ExcellentFastGoodNoYes
OllamaFreeGoodFastestFairYesNo

Choosing a Provider

Use OpenAI if:

  • โœ… You want the most popular, well-supported option
  • โœ… You're already using OpenAI for other services
  • โœ… You need reliable, high-quality embeddings
  • โœ… English is your primary language

Use Cohere if:

  • โœ… You need best-in-class multilingual support (100+ languages)
  • โœ… You're working with non-English content
  • โœ… You want semantic search optimized embeddings
  • โœ… You need smaller models (light versions)

Use Voyage AI if:

  • โœ… You want specialized, domain-specific models (code, law)
  • โœ… You need the highest quality embeddings
  • โœ… You're working with technical or legal documents
  • โœ… You value a company focused solely on embeddings

Use Ollama if:

  • โœ… You want 100% free, no API costs
  • โœ… Privacy is critical (data never leaves your server)
  • โœ… You need to work offline
  • โœ… You have sufficient local compute resources
  • โœ… You prefer open-source solutions

API Pricing (Approximate)

ProviderModelPrice per 1M tokens
OpenAItext-embedding-3-small$0.02
OpenAItext-embedding-3-large$0.13
Cohereembed-english-v3.0$0.10
Cohereembed-multilingual-v3.0$0.10
Voyagevoyage-3$0.10
Voyagevoyage-large-2$0.12
Ollamaany modelFREE

Prices as of 2026-03. Check provider websites for current pricing.


Best Practices

1. Choose Consistent Dimensions

  • Once you embed documents with a specific dimension, stick with it
  • Changing dimensions requires re-embedding all documents
  • Higher dimensions = better quality but slower search

2. Monitor Costs

  • Track API usage via provider dashboards
  • Consider caching embeddings for frequently accessed documents
  • Use smaller models for development/testing

3. Test Before Production

  • Compare quality across providers with your specific data
  • Measure search relevance for your use case
  • Benchmark performance (speed vs quality)

4. Security

  • Never commit API keys to git
  • Use environment variables or secure secret management
  • Rotate API keys regularly

5. Switching Providers

  • Database configs allow easy switching between models
  • Test new provider with subset of data first
  • Re-embed all documents when switching providers

Troubleshooting

"No active embedding configuration"

  • Set environment variables OR configure in Admin Panel
  • Ensure API key is valid and has credits
  • Check server logs for detailed error messages

"Dimensions mismatch"

  • All documents in a collection must use same dimensions
  • Clear existing embeddings before switching models
  • Consider creating new collection for different model

"API rate limit exceeded"

  • Slow down embedding worker (reduce batch size)
  • Upgrade API plan with provider
  • Consider switching to local Ollama

Ollama connection failed

  • Ensure Ollama is running: ollama serve
  • Check API URL is correct (default: http://localhost:11434)
  • Verify model is pulled: ollama list

Examples

OpenAI Configuration

{ "id": "openai-small", "name": "OpenAI Small", "provider": "openai", "model": "text-embedding-3-small", "dimensions": 1536, "apiKey": "sk-...", "apiUrl": "https://api.openai.com/v1", "isDefault": true
}

Cohere Multilingual

{ "id": "cohere-multi", "name": "Cohere Multilingual", "provider": "cohere", "model": "embed-multilingual-v3.0", "dimensions": 1024, "apiKey": "cohere_api_key...", "apiUrl": "https://api.cohere.ai/v1", "isDefault": true
}

Voyage for Code

{ "id": "voyage-code", "name": "Voyage Code", "provider": "voyage", "model": "voyage-code-2", "dimensions": 1536, "apiKey": "pa-...", "apiUrl": "https://api.voyageai.com/v1", "isDefault": true
}

Ollama Local

{ "id": "ollama-nomic", "name": "Ollama Nomic", "provider": "ollama", "model": "nomic-embed-text", "dimensions": 768, "apiKey": "", "apiUrl": "http://localhost:11434", "isDefault": true
}

Classification

All embedding providers support zero-shot classification via POST /v1/classify. This feature embeds candidate labels and computes similarity to your documents โ€” no training data required. See Search Algorithms for details.


Related Documentation


Support

  • GitHub Issues: https://github.com/tradik/mddb/issues
  • Discussions: https://github.com/tradik/mddb/discussions
  • Documentation: https://github.com/tradik/mddb/docs

Last updated: 2026-03-02