Embedding Providers Guide

MDDB supports multiple embedding providers for vector search functionality. You can configure embeddings either through environment variables or via the Admin Panel UI.

Supported Providers

1. OpenAI

API URL: https://api.openai.com/v1 Authentication: API Key required Documentation: https://platform.openai.com/docs/guides/embeddings

Popular Models

Model	Dimensions	Use Case	Cost
`text-embedding-3-small`	1536	Fast, cost-effective, general purpose	$
`text-embedding-3-large`	3072	Highest quality, best performance	$$$
`text-embedding-ada-002`	1536	Legacy model (v2)	$$

Environment Variables

export MDDB_EMBEDDING_PROVIDER=openai
export MDDB_EMBEDDING_API_KEY=sk-...
export MDDB_EMBEDDING_MODEL=text-embedding-3-small
export MDDB_EMBEDDING_DIMENSIONS=1536

2. Cohere

API URL: https://api.cohere.ai/v1 Authentication: API Key required Documentation: https://docs.cohere.com/docs/embeddings

Popular Models

Model	Dimensions	Use Case	Languages
`embed-english-v3.0`	1024	English text	English only
`embed-multilingual-v3.0`	1024	Multilingual support	100+ languages
`embed-english-light-v3.0`	384	Fast, smaller embeddings	English only
`embed-multilingual-light-v3.0`	384	Fast, smaller, multilingual	100+ languages

Features

✅ Best multilingual support
✅ Semantic search optimized
✅ Built-in compression options

Environment Variables

export MDDB_EMBEDDING_PROVIDER=cohere
export MDDB_EMBEDDING_API_KEY=cohere_api_key...
export MDDB_EMBEDDING_MODEL=embed-english-v3.0
export MDDB_EMBEDDING_DIMENSIONS=1024

3. Voyage AI

API URL: https://api.voyageai.com/v1 Authentication: API Key required Documentation: https://docs.voyageai.com/

Popular Models

Model	Dimensions	Use Case	Specialty
`voyage-3`	1024	Latest, best quality	General purpose
`voyage-large-2`	1536	High accuracy	Long documents
`voyage-code-2`	1536	Code embeddings	Programming code
`voyage-law-2`	1024	Legal documents	Legal text

Features

✅ Specialized in embeddings (not general LLM)
✅ Very high quality
✅ Domain-specific models (code, law)
✅ Competitive pricing

Environment Variables

export MDDB_EMBEDDING_PROVIDER=voyage
export MDDB_EMBEDDING_API_KEY=pa-...
export MDDB_EMBEDDING_MODEL=voyage-3
export MDDB_EMBEDDING_DIMENSIONS=1024

4. Ollama (Local)

API URL: http://localhost:11434 (default) Authentication: None (local server) Documentation: https://ollama.ai/

Popular Models

Model	Dimensions	Size	Quality
`nomic-embed-text`	768	~275MB	Good, fast
`mxbai-embed-large`	1024	~670MB	Better quality
`all-minilm`	384	~45MB	Small, very fast
`snowflake-arctic-embed`	1024	~669MB	High quality

Features

✅ Fully local, no API costs
✅ Privacy-focused
✅ Works offline
✅ Multiple open-source models

Setup

Install Ollama: https://ollama.ai/download
Pull model: ollama pull nomic-embed-text
Run server: ollama serve (usually auto-starts)

Environment Variables

export MDDB_EMBEDDING_PROVIDER=ollama
export MDDB_EMBEDDING_API_URL=http://localhost:11434
export MDDB_EMBEDDING_MODEL=nomic-embed-text
export MDDB_EMBEDDING_DIMENSIONS=768

Configuration Methods

Method 1: Environment Variables (Legacy)

Set environment variables before starting mddbd:

export MDDB_EMBEDDING_PROVIDER=openai
export MDDB_EMBEDDING_API_KEY=sk-...
export MDDB_EMBEDDING_MODEL=text-embedding-3-small
export MDDB_EMBEDDING_DIMENSIONS=1536

./mddbd

Method 2: Admin Panel (Recommended)

Open mddb-panel: http://localhost:11024
Navigate to Administration → Embedding Models
Click Add Model
Fill in configuration:
- ID: Unique identifier (e.g., openai-small, cohere-multilingual)
- Name: Display name (e.g., OpenAI Small, Cohere Multilingual)
- Provider: Select from dropdown
- Model: Model name
- Dimensions: Vector dimensions
- API Key: Your API key (for OpenAI, Cohere, Voyage)
- API URL: Custom URL or leave empty for default
- Set as default: Check to make this the active model
Click Create

Import Current Config

If you're using environment variables and want to migrate to database config:

Open Administration → Embedding Models
If no configs exist, you'll see "Import Current Configuration"
Click Import Current Config to save your env var config to the database

Comparison Matrix

Provider	Cost	Quality	Speed	Multilingual	Local	API Key Required
OpenAI	$$	Excellent	Fast	Good	No	Yes
Cohere	$$	Excellent	Fast	Best	No	Yes
Voyage	$$	Excellent	Fast	Good	No	Yes
Ollama	Free	Good	Fastest	Fair	Yes	No

Choosing a Provider

Use OpenAI if:

✅ You want the most popular, well-supported option
✅ You're already using OpenAI for other services
✅ You need reliable, high-quality embeddings
✅ English is your primary language

Use Cohere if:

✅ You need best-in-class multilingual support (100+ languages)
✅ You're working with non-English content
✅ You want semantic search optimized embeddings
✅ You need smaller models (light versions)

Use Voyage AI if:

✅ You want specialized, domain-specific models (code, law)
✅ You need the highest quality embeddings
✅ You're working with technical or legal documents
✅ You value a company focused solely on embeddings

Use Ollama if:

✅ You want 100% free, no API costs
✅ Privacy is critical (data never leaves your server)
✅ You need to work offline
✅ You have sufficient local compute resources
✅ You prefer open-source solutions

API Pricing (Approximate)

Provider	Model	Price per 1M tokens
OpenAI	text-embedding-3-small	$0.02
OpenAI	text-embedding-3-large	$0.13
Cohere	embed-english-v3.0	$0.10
Cohere	embed-multilingual-v3.0	$0.10
Voyage	voyage-3	$0.10
Voyage	voyage-large-2	$0.12
Ollama	any model	FREE

Prices as of 2026-03. Check provider websites for current pricing.

Best Practices

1. Choose Consistent Dimensions

Once you embed documents with a specific dimension, stick with it
Changing dimensions requires re-embedding all documents
Higher dimensions = better quality but slower search

2. Monitor Costs

Track API usage via provider dashboards
Consider caching embeddings for frequently accessed documents
Use smaller models for development/testing

3. Test Before Production

Compare quality across providers with your specific data
Measure search relevance for your use case
Benchmark performance (speed vs quality)

4. Security

Never commit API keys to git
Use environment variables or secure secret management
Rotate API keys regularly

5. Switching Providers

Database configs allow easy switching between models
Test new provider with subset of data first
Re-embed all documents when switching providers

Troubleshooting

"No active embedding configuration"

Set environment variables OR configure in Admin Panel
Ensure API key is valid and has credits
Check server logs for detailed error messages

"Dimensions mismatch"

All documents in a collection must use same dimensions
Clear existing embeddings before switching models
Consider creating new collection for different model

"API rate limit exceeded"

Slow down embedding worker (reduce batch size)
Upgrade API plan with provider
Consider switching to local Ollama

Ollama connection failed

Ensure Ollama is running: ollama serve
Check API URL is correct (default: http://localhost:11434)
Verify model is pulled: ollama list

Examples

OpenAI Configuration

{
  "id": "openai-small",
  "name": "OpenAI Small",
  "provider": "openai",
  "model": "text-embedding-3-small",
  "dimensions": 1536,
  "apiKey": "sk-...",
  "apiUrl": "https://api.openai.com/v1",
  "isDefault": true
}

Cohere Multilingual

{
  "id": "cohere-multi",
  "name": "Cohere Multilingual",
  "provider": "cohere",
  "model": "embed-multilingual-v3.0",
  "dimensions": 1024,
  "apiKey": "cohere_api_key...",
  "apiUrl": "https://api.cohere.ai/v1",
  "isDefault": true
}

Voyage for Code

{
  "id": "voyage-code",
  "name": "Voyage Code",
  "provider": "voyage",
  "model": "voyage-code-2",
  "dimensions": 1536,
  "apiKey": "pa-...",
  "apiUrl": "https://api.voyageai.com/v1",
  "isDefault": true
}

Ollama Local

{
  "id": "ollama-nomic",
  "name": "Ollama Nomic",
  "provider": "ollama",
  "model": "nomic-embed-text",
  "dimensions": 768,
  "apiKey": "",
  "apiUrl": "http://localhost:11434",
  "isDefault": true
}

Classification

All embedding providers support zero-shot classification via POST /v1/classify. This feature embeds candidate labels and computes similarity to your documents — no training data required. See Search Algorithms for details.

Support

GitHub Issues: https://github.com/tradik/mddb/issues
Discussions: https://github.com/tradik/mddb/discussions
Documentation: https://github.com/tradik/mddb/docs

Last updated: 2026-03-02

Supported Providers

1. OpenAI

Popular Models

Environment Variables

2. Cohere

Popular Models

Features

Environment Variables

3. Voyage AI

Popular Models

Features

Environment Variables

4. Ollama (Local)

Popular Models

Features

Setup

Environment Variables

Configuration Methods

Method 1: Environment Variables (Legacy)

Method 2: Admin Panel (Recommended)

Import Current Config

Comparison Matrix

Choosing a Provider

Use OpenAI if:

Use Cohere if:

Use Voyage AI if:

Use Ollama if:

API Pricing (Approximate)

Best Practices

1. Choose Consistent Dimensions

2. Monitor Costs

3. Test Before Production

4. Security

5. Switching Providers

Troubleshooting

"No active embedding configuration"

"Dimensions mismatch"

"API rate limit exceeded"

Ollama connection failed

Examples

OpenAI Configuration

Cohere Multilingual

Voyage for Code

Ollama Local

Classification

Related Documentation

Support