Build a WordPress AI Agent
A step-by-step guide to building your own AI chatbot for a WordPress site. The entire stack runs locally on a single machine -- no cloud, no API fees.
How It Works
graph LR WP["WordPress<br/>Your content"] EXP["wpexportjson<br/>Export to Markdown"] MDDB["mddbd<br/>Document database"] OLL["Ollama<br/>LLM + Embeddings"] UI["Open WebUI<br/>Chat interface"] WP -->|1. Export Markdown| EXP EXP -->|2. Import to DB| MDDB MDDB -->|3. Generate vectors| OLL UI -->|4. User asks| OLL OLL -->|5. Search context| MDDB MDDB -->|6. Return docs| OLL OLL -->|7. Answer with context| UI style WP fill:#21759b,color:#fff style EXP fill:#e67e22,color:#fff style MDDB fill:#4a9eff,color:#fff style OLL fill:#1a1a2e,color:#fff style UI fill:#1e8449,color:#fff
Components
| Component | Role |
|---|---|
| WordPress | Your website with content (posts, pages, products) |
| wpexportjson | Exports WordPress content to Markdown via REST API |
| mddbd | Markdown database -- stores content + vector embeddings |
| Ollama | Local AI server -- runs LLM and embedding models |
| Open WebUI | Chat interface -- where you talk to the agent |
Hardware Requirements
| Resource | Minimum | Recommended |
|---|---|---|
| RAM | 16 GB | 32 GB |
| CPU | 4 cores | 8+ cores |
| GPU | none (CPU mode) | NVIDIA 8GB+ VRAM (RTX 3060+) |
| Disk | 20 GB free | 50 GB SSD |
| OS | Linux / macOS / WSL2 | Linux (Ubuntu 22.04+) |
Why GPU? Ollama works on CPU but is ~10x slower. With an NVIDIA GPU (CUDA), responses generate in 1-3 seconds instead of 10-30.
CPU-only mode:
- Use smaller models:
llama3.2:3binstead ofllama3.2:8b - The embedding model (
nomic-embed-text) is lightweight and runs well on CPU - Responses will be slower but fully functional
Step 1: Docker Compose
Create docker-compose.yml:
services: # --- Document database --- mddb: image: ghcr.io/tradik/mddbd:latest ports: - "11023:11023" # HTTP API - "11024:11024" # gRPC volumes: - mddb-data:/data environment: - MDDB_PATH=/data/mddb.db - MDDB_MODE=wr # Ollama as embedding provider - MDDB_EMBEDDING_PROVIDER=ollama - MDDB_EMBEDDING_API_URL=http://ollama:11434 - MDDB_EMBEDDING_MODEL=nomic-embed-text - MDDB_EMBEDDING_DIMENSIONS=768 healthcheck: test: ["CMD", "wget", "-q", "--spider", "http://localhost:11023/health"] interval: 10s timeout: 3s depends_on: - ollama # --- Local AI (LLM + Embeddings) --- ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama-data:/root/.ollama # Uncomment the lines below if you have an NVIDIA GPU: # deploy: # resources: # reservations: # devices: # - driver: nvidia # count: 1 # capabilities: [gpu] # --- Chat interface --- open-webui: image: ghcr.io/open-webui/open-webui:main ports: - "3000:8080" volumes: - openwebui-data:/app/backend/data environment: - OLLAMA_BASE_URL=http://ollama:11434 - WEBUI_AUTH=false # Disable login (dev mode) depends_on: - ollama volumes: mddb-data: ollama-data: openwebui-data:
Start the stack:
docker compose up -d
Step 2: Pull Ollama Models
Wait for Ollama to start (~30s), then:
docker exec -it ollama ollama pull nomic-embed-text docker exec -it ollama ollama pull llama3.2:3b # 2GB, fast, good for CPU
docker exec -it ollama ollama pull llama3.2:8b # 5GB, better quality, needs GPU
Verify MDDB can see Ollama:
curl http://localhost:11023/v1/vector-stats
Step 3: Export from WordPress
Use wpexportjson to export your WordPress content directly to Markdown files with YAML frontmatter.
Install
go install github.com/tradik/wpexporter/cmd/wpexporter@latest docker pull ghcr.io/tradik/wpexporter:latest
Basic export
wpexportjson export --url https://your-site.com -f markdown --output ./wp-export docker run --rm -v $(pwd)/wp-export:/export ghcr.io/tradik/wpexporter:latest \ wpexportjson export --url https://your-site.com -f markdown --output /export
This creates a directory with Markdown files:
wp-export/
โโโ posts/
โ โโโ how-to-file-a-complaint.md
โ โโโ welcome-to-our-site.md
โ โโโ ...
โโโ pages/
โ โโโ about.md
โ โโโ contact.md
โ โโโ ...
โโโ media/ โโโ 123_featured-image.webp โโโ ...
Each .md file includes YAML frontmatter with metadata:
---
id: 123
title: "How to File a Complaint"
slug: "how-to-file-a-complaint"
date: "2025-03-15T10:30:00"
author: "John Doe"
categories: - Support - FAQ
tags: - complaints - customer-service
featured_image: "https://your-site.com/wp-content/uploads/complaint.jpg"
status: "publish"
--- If you need to file a complaint, follow these steps...
Advanced export options
| Flag | Description |
|---|---|
--flat-html | Convert page builder HTML (Bricks, Elementor, Divi) to clean Markdown |
--assisted-crawl | Extract SEO metadata (title, description, og:tags, hreflangs) |
--download-media | Download images and videos (default: true) |
--relevant-media-only | Only download featured images and content images |
--brute-force | Discover unlisted/hidden content by ID enumeration |
--path-filter=/blog/ | Export only specific sections of the site |
--no-posts | Skip blog posts |
--no-pages | Skip pages |
--no-products | Skip WooCommerce products |
--resume | Resume interrupted export from checkpoint |
--zip | Create ZIP archive of export |
--crawl-content | Crawl pages with empty content (page builder sites) |
--rate-limit 500 | Delay between requests in ms (prevent rate limiting) |
Example: page builder site with SEO data
wpexportjson export \ --url https://your-site.com \ -f markdown \ --flat-html \ --assisted-crawl \ --crawl-content \ --relevant-media-only \ --output ./wp-export
Example: export only the blog section
wpexportjson export \ --url https://your-site.com \ -f markdown \ --path-filter=/blog/ \ --output ./wp-export
Example: resume a large interrupted export
wpexportjson export \ --url https://large-site.com \ -f markdown \ --resume \ --output ./wp-export
Step 4: Import into MDDB
The Markdown files from Step 3 already have YAML frontmatter with metadata. Import them with a simple bash loop using curl -- no extra tools needed.
Import with curl (recommended)
#!/bin/bash
MDDB_URL="${MDDB_URL:-http://localhost:11023}"
COLLECTION="${1:?Usage: $0 <collection> <folder> [type]}"
FOLDER="${2:?Usage: $0 <collection> <folder> [type]}"
TYPE="${3:-post}" count=0
for file in "$FOLDER"/*.md; do [ -f "$file" ] || continue key=$(basename "$file" .md | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g;s/--*/-/g;s/^-//;s/-$//') content=$(cat "$file") curl -s -X POST "$MDDB_URL/v1/add" \ -H "Content-Type: application/json" \ -d "$(jq -n \ --arg col "$COLLECTION" \ --arg key "$key" \ --arg body "$content" \ --arg type "$TYPE" \ '{collection: $col, key: $key, lang: "en_US", contentMd: $body, meta: {type: [$type], source: ["wordpress"]}}' )" > /dev/null count=$((count + 1)) printf "\r Imported %d files..." "$count"
done
echo -e "\n Done: $count files imported to collection '$COLLECTION'"
Run it:
bash import-to-mddb.sh wordpress ./wp-export/posts post bash import-to-mddb.sh wordpress ./wp-export/pages page
That's it. The script needs only curl and jq (both pre-installed on most systems).
Note: For large imports (1000+ files), metadata parsing, progress bars, and dry-run mode, use the full load-md-folder.sh script. It requires
mddb-cliin PATH. See Bulk Import Guide.
After import, MDDB automatically generates embeddings (vectors) in the background via Ollama. Check progress:
curl -s http://localhost:11023/v1/vector-stats | python3 -m json.tool
{ "enabled": true, "model": "nomic-embed-text", "dimensions": 768, "collections": { "wordpress": { "total_documents": 150, "embedded_documents": 142 } }
}
When embedded_documents == total_documents, all content is ready for search.
Step 5: Add a Tool in Open WebUI
This is the key step. Open http://localhost:3000 and:
- Click profile icon (bottom left) -> Admin Panel -> Tools
- Click "+" (Create Tool)
- Paste the following code:
"""
title: WordPress Search (MDDB)
description: Search WordPress content via MDDB vector database
author: you
version: 1.0.0
requirements: httpx
""" import httpx
import json
from pydantic import BaseModel, Field class Tools: class Valves(BaseModel): MDDB_URL: str = Field( default="http://mddb:11023", description="MDDB server URL", ) COLLECTION: str = Field( default="wordpress", description="MDDB collection name", ) def __init__(self): self.valves = self.Valves() async def search_wordpress( self, query: str, __event_emitter__=None, ) -> str: """ Semantic search across WordPress content. Use this tool when the user asks about website content. :param query: User question, e.g. "how do I file a complaint" :return: Matching articles with content """ if __event_emitter__: await __event_emitter__( { "type": "status", "data": { "description": "Searching WordPress content...", "done": False, }, } ) async with httpx.AsyncClient(timeout=15) as client: resp = await client.post( f"{self.valves.MDDB_URL}/v1/vector-search", json={ "collection": self.valves.COLLECTION, "query": query, "topK": 5, "threshold": 0.5, "includeContent": True, }, ) data = resp.json() results = data.get("results", []) if __event_emitter__: await __event_emitter__( { "type": "status", "data": { "description": f"Found {len(results)} articles", "done": True, }, } ) if not results: return "No matching articles found in the WordPress database." output = [] for r in results: doc = r.get("document", {}) score = r.get("score", 0) title = (doc.get("meta", {}).get("title") or [""])[0] content = doc.get("contentMd", "")[:2000] output.append( f"### {title} (relevance: {score:.0%})\n{content}" ) return "\n\n---\n\n".join(output) async def search_wordpress_by_category( self, query: str, category: str, __event_emitter__=None, ) -> str: """ Search WordPress content filtered by category. :param query: User question :param category: Category name, e.g. "tutorials", "news" :return: Matching articles from that category """ if __event_emitter__: await __event_emitter__( { "type": "status", "data": { "description": f"Searching in category '{category}'...", "done": False, }, } ) async with httpx.AsyncClient(timeout=15) as client: resp = await client.post( f"{self.valves.MDDB_URL}/v1/vector-search", json={ "collection": self.valves.COLLECTION, "query": query, "topK": 5, "threshold": 0.5, "includeContent": True, "filterMeta": {"categories": [category]}, }, ) data = resp.json() results = data.get("results", []) if __event_emitter__: await __event_emitter__( { "type": "status", "data": { "description": f"Found {len(results)} articles", "done": True, }, } ) if not results: return f"No articles found in category '{category}'." output = [] for r in results: doc = r.get("document", {}) score = r.get("score", 0) title = (doc.get("meta", {}).get("title") or [""])[0] content = doc.get("contentMd", "")[:2000] output.append( f"### {title} (relevance: {score:.0%})\n{content}" ) return "\n\n---\n\n".join(output) async def list_wordpress_articles( self, __event_emitter__=None, ) -> str: """ List recent articles from WordPress. Use when the user asks "what's on the site?" or "show me articles". :return: List of articles with titles """ if __event_emitter__: await __event_emitter__( { "type": "status", "data": { "description": "Fetching article list...", "done": False, }, } ) async with httpx.AsyncClient(timeout=15) as client: resp = await client.post( f"{self.valves.MDDB_URL}/v1/search", json={ "collection": self.valves.COLLECTION, "sort": "updatedAt", "asc": False, "limit": 20, }, ) data = resp.json() docs = data.get("docs", []) if __event_emitter__: await __event_emitter__( { "type": "status", "data": { "description": f"Fetched {len(docs)} articles", "done": True, }, } ) if not docs: return "No articles in the database." lines = [] for d in docs: title = (d.get("meta", {}).get("title") or ["(untitled)"])[0] date = (d.get("meta", {}).get("date") or [""])[0][:10] lines.append(f"- <strong>{title}</strong> ({date})") return "Articles on the site:\n\n" + "\n".join(lines)
- Click Save
- Go back to the chat, select a model (e.g.
llama3.2:3b) - Click the "+" icon next to the message field and enable "WordPress Search (MDDB)"
- Ask: "What articles do you have?" or "How do I file a complaint?"
What Happens Under the Hood
sequenceDiagram participant U as You (chat) participant OW as Open WebUI participant LLM as Ollama (llama3.2) participant MDDB as mddbd participant EMB as Ollama (nomic-embed) U->>OW: "How do I file a complaint?" OW->>LLM: Question + available tools LLM->>OW: Call search_wordpress("file complaint") OW->>MDDB: POST /v1/vector-search {"query":"file complaint"} MDDB->>EMB: Embed("file complaint") -> vector [0.12, -0.34, ...] EMB-->>MDDB: query vector MDDB-->>OW: Top 5 similar articles OW->>LLM: Here is context from the site: [articles] LLM-->>OW: "To file a complaint, go to..." OW-->>U: Answer grounded in your site's content
Step 6: Set the System Prompt
In Open WebUI chat settings, set the System Prompt:
You are a helpful assistant for [YOUR WEBSITE NAME].
You answer customer questions based on the website's content. RULES:
- Always use the search_wordpress tool when the question relates to site content
- If you cannot find an answer in the content, say so honestly
- Cite article titles as sources
- Be helpful and specific
Keeping Content Up to Date
Option A: Cron job with wpexportjson
0 3 * * * wpexportjson export --url https://your-site.com -f markdown --output /home/user/wp-export && /home/user/load-md-folder.sh /home/user/wp-export/posts wordpress -r -m "type=post" && /home/user/load-md-folder.sh /home/user/wp-export/pages wordpress -r -m "type=page" >> /var/log/mddb-sync.log 2>&1
Option B: WordPress webhooks
Use a WordPress plugin (e.g. WP Webhooks) to call MDDB directly when content changes:
curl -X POST http://localhost:11023/v1/import-url \ -d '{ "collection": "wordpress", "url": "https://your-site.com/wp-json/wp/v2/posts/123", "lang": "en_US" }'
Advanced: wpmcp (MCP Server)
The wpexporter toolkit includes wpmcp -- an MCP server that lets AI assistants interact with WordPress directly.
Available tools
| Tool | Description |
|---|---|
list_formats | List all 14 available export formats |
get_site_info | Get WordPress site information |
list_posts | List posts with optional path filtering |
list_pages | List pages from a site |
export_site | Full site export to any format |
get_post | Get a specific post by ID |
list_categories | List all categories |
list_media | List media files |
Claude Desktop configuration
{ "mcpServers": { "wpexporter": { "command": "wpmcp", "args": ["serve"] } }
}
Dual MCP setup (wpexporter + MDDB)
You can use both MCP servers together -- wpmcp for reading live WordPress data, and mddb-mcp for vector search over imported content:
{ "mcpServers": { "wpexporter": { "command": "wpmcp", "args": ["serve"] }, "mddb": { "command": "docker", "args": [ "run", "-i", "--rm", "--network", "host", "-e", "MDDB_MCP_STDIO=true", "-e", "MDDB_GRPC_ADDRESS=localhost:11024", "-e", "MDDB_REST_BASE_URL=http://localhost:11023", "tradik/mddb:latest" ] } }
}
Directory Structure
your-server/
โโโ docker-compose.yml # Full stack (mddb + ollama + open-webui)
โโโ load-md-folder.sh # MDDB bulk importer (downloaded from GitHub)
โโโ wp-export/ # wpexportjson output
โ โโโ posts/ # Markdown posts with frontmatter
โ โโโ pages/ # Markdown pages with frontmatter
โ โโโ media/ # Downloaded images
โโโ data/ # Docker volumes โโโ mddb-data/ # Documents + vectors โโโ ollama-data/ # AI models (~5-10GB) โโโ openwebui-data/ # Chat settings
Ports:
| Service | Port | URL |
|---|---|---|
| Open WebUI (chat) | 3000 | http://localhost:3000 |
| MDDB API | 11023 | http://localhost:11023 |
| MDDB gRPC | 11024 | -- |
| Ollama | 11434 | http://localhost:11434 |
FAQ
How much does it cost?
$0. Everything runs locally. The only cost is the server/computer.
How long does embedding 1000 posts take?
- With GPU: ~5-10 minutes
- CPU only: ~30-60 minutes
- After the initial import, subsequent syncs only embed new/changed posts
Can I use OpenAI instead of Ollama?
Yes -- change the environment variables in docker-compose:
- MDDB_EMBEDDING_PROVIDER=openai
- MDDB_EMBEDDING_API_KEY=sk-xxx
- MDDB_EMBEDDING_MODEL=text-embedding-3-small
- MDDB_EMBEDDING_DIMENSIONS=1536
But you lose privacy and pay per token.
My site uses a page builder (Elementor, Bricks, Divi). Will the export work?
Yes -- use --flat-html and --crawl-content flags. wpexportjson has built-in conversion rules for Bricks, Elementor, Divi, Oxygen, and GenerateBlocks. You can also define custom rules in a config.yaml file. See the wpexporter documentation.
My site has protected content. Can I export it?
Yes -- wpexportjson supports Basic Auth (--auth-user / --auth-pass) and Bearer tokens (--auth-token). For deeper access (drafts, private posts), use wpxmlrpc instead:
wpxmlrpc export --url https://your-site.com \ --username admin --password mypassword \ -f markdown --output ./wp-export
The export was interrupted. Do I need to start over?
No -- use --resume to continue from where it stopped:
wpexportjson export --url https://your-site.com -f markdown --resume --output ./wp-export
What about MCP?
MCP (Model Context Protocol) works with Claude Desktop and Windsurf/Cursor. Open WebUI supports MCP since v0.6.31 (Streamable HTTP transport), but the Python Tool approach described above is simpler -- zero extra components.
If you want MCP with Open WebUI, use MCPO (proxy):
docker run -p 8000:8000 ghcr.io/open-webui/mcpo:main -- \ docker exec -i mddb-mcp /app/mddb-mcp-stdio
Then in Open WebUI: Admin -> Tools -> Add Server -> MCP (MCPO).
Can I export WooCommerce products?
Yes -- wpexportjson automatically detects WooCommerce and exports products. Use --no-products to skip them if not needed.
Exposing MDDB to External AI Platforms
If you expose MDDB on a public URL (e.g. https://mddb.your-site.com), multiple
AI platforms can use it as a knowledge base. Each platform has a different
integration method.
Claude (Desktop / Code / API)
Claude supports MCP natively. Point it at your mddb-mcp instance:
{ "mcpServers": { "mddb": { "command": "docker", "args": [ "run", "-i", "--rm", "-e", "MDDB_MCP_STDIO=true", "-e", "MDDB_REST_BASE_URL=https://mddb.your-site.com", "tradik/mddb:latest" ] } }
}
Claude gets all 23 tools (search, vector search, add, list collections, etc.).
ChatGPT (Custom GPTs / Actions)
ChatGPT uses OpenAPI Actions. MDDB already ships openapi.yaml -- upload it
as an Action in ChatGPT's GPT Builder:
- Go to ChatGPT GPT Builder
- Create a new GPT โ Configure โ Actions โ Import from URL
- Enter:
https://mddb.your-site.com/openapi.yaml - ChatGPT discovers all endpoints automatically
- Set the system prompt to use
vector-searchfor content questions
No MCP needed -- ChatGPT calls the REST API directly.
DeepSeek / Manus / Other Agents
Any agent that can make HTTP calls works with MDDB's REST API:
curl -X POST https://mddb.your-site.com/v1/vector-search \ -H "Content-Type: application/json" \ -d '{"collection":"wordpress","query":"how to file a complaint","topK":5}' curl -X POST https://mddb.your-site.com/v1/search \ -H "Content-Type: application/json" \ -d '{"collection":"wordpress","query":"complaint","limit":10}'
For agents that support MCP over HTTP (Streamable HTTP transport), run mddb-mcp
in HTTP mode:
docker run -p 8080:8080 \ -e MDDB_REST_BASE_URL=https://mddb.your-site.com \ -e MCP_TRANSPORT=http \ -e MCP_PORT=8080 \ tradik/mddb:latest
YAML Custom Tools per Website (v2.3.3+)
When exposing MDDB externally, generic tools like semantic_search(collection, query)
work -- but the LLM has to guess collection names and parameters. The custom tools
feature in mddb-mcp lets you define website-specific tools in config.yaml:
mcp: listenAddress: "0.0.0.0:9000" # dedicated MCP port (or omit to embed on HTTP port)
mddb: restBaseUrl: "http://localhost:11023" custom_tools: - name: search_faq description: "Search frequently asked questions" action: semantic_search # underlying MDDB operation defaults: collection: faq topK: 3 threshold: 0.6 parameters: - name: query type: string required: true description: "User question about FAQs" - name: search_products description: "Find products in the catalog" action: semantic_search defaults: collection: products topK: 5 threshold: 0.5 parameters: - name: query type: string required: true description: "Product name or description" - name: latest_blog_posts description: "Get the most recent blog posts" action: search_documents defaults: collection: blog sort: updatedAt asc: false limit: 10 # no parameters = zero-arg tool - name: search_docs description: "Search technical documentation" action: full_text_search defaults: collection: docs limit: 10 parameters: - name: query type: string required: true
Supported actions (3 available):
semantic_searchโ vector/embedding similarity searchsearch_documentsโ metadata-based search with sortingfull_text_searchโ keyword search with TF scoring
Why this matters for external AI:
| Without custom tools | With custom tools |
|---|---|
AI sees: semantic_search(collection, query, topK, ...) | AI sees: search_faq(query), search_products(query) |
| AI must guess collection names | Collections are preconfigured |
| Same generic description for all | Domain-specific descriptions per tool |
| Works, but requires good system prompt | Works out of the box |
How it works:
- Define custom tools in
config.yamlundercustom_tools: mddb-mcpvalidates the config at startup (no conflicts with 23 built-in tools)- Custom tools appear alongside built-in tools in
tools/list - When called, defaults are merged with user arguments and delegated to the underlying action
- Works on both transports: stdio (Claude Desktop) and HTTP
Call flow:
AI calls: search_faq(query: "reset password") โ mddb-mcp merges defaults: {collection:faq, topK:3, threshold:0.6, query:"reset password"} โ delegates to: semantic_search(merged args) โ returns: vector search results from "faq" collection
Docker example:
docker run -i --rm --network host \ -v ./config.yaml:/app/config.yaml \ -e MDDB_MCP_STDIO=true \ -e MDDB_MCP_CONFIG=/app/config.yaml \ tradik/mddb:latest