WordPress Website Analyzer
Export content from a WordPress website into MDDB, then analyze it with Claude CLI โ find broken links, missing meta tags, duplicate content, and other issues.
WordPress โ wpexportjson โ MDDB โ Claude CLI (MCP) โ analysis
This guide uses www.malacukierenka.pl as an example.
Requirements
- wpexportjson โ export WordPress content
- MDDB โ document database
- Claude CLI โ content analysis
Step 1: Install wpexportjson
go install github.com/tradik/wpexporter/cmd/wpexportjson@latest
Verify the installation:
wpexportjson --help
If you don't have Go, download a pre-built binary from releases.
Step 2: Start MDDB
docker run -d \ --name mddb \ -p 11023:11023 \ -v mddb-data:/data \ tradik/mddb:latest
Verify:
curl http://localhost:11023/v1/health
Step 3: Export content from WordPress
3a. Export public content (no login required)
wpexportjson fetches content through the public WordPress REST API:
wpexportjson \ --url https://www.malacukierenka.pl \ --output ./export \ --format markdown
This exports:
- Posts (blog articles)
- Static pages
- Categories and tags
- Authors
The .md files are saved to the ./export/ folder.
3b. Check what was downloaded
ls -la export/
ls export/posts/ | head -10
ls export/pages/ | head -10
Preview a file:
cat export/posts/$(ls export/posts/ | head -1) | head -30
Each file includes YAML frontmatter with metadata (title, date, categories, tags, author).
Step 4: Load content into MDDB
4a. Load posts and pages
curl -sO https://raw.githubusercontent.com/tradik/mddb/main/scripts/load-md-folder.sh
chmod +x load-md-folder.sh ./load-md-folder.sh export/posts/ malacukierenka-posts --lang pl_PL --verbose ./load-md-folder.sh export/pages/ malacukierenka-pages --lang pl_PL --verbose
4b. Alternative โ use the CLI
for md_file in export/posts/*.md; do key=$(basename "$md_file" .md) mddb-cli add malacukierenka-posts "$key" pl -f "$md_file" \ -m "source=wordpress,site=malacukierenka.pl,type=post"
done for md_file in export/pages/*.md; do key=$(basename "$md_file" .md) mddb-cli add malacukierenka-pages "$key" pl -f "$md_file" \ -m "source=wordpress,site=malacukierenka.pl,type=page"
done
4c. Verify the result
curl -s http://localhost:11023/v1/stats | python3 -m json.tool
You should see malacukierenka-posts and malacukierenka-pages collections with documents.
Step 5: Configure Claude CLI (MCP)
Create the MCP config file:
cat > ~/Library/Application\ Support/Claude/claude_desktop_config.json << 'EOF'
{ "mcpServers": { "mddb": { "command": "docker", "args": [ "run", "-i", "--rm", "--network", "host", "-e", "MDDB_MCP_STDIO=true", "-e", "MDDB_SERVER=http://localhost:11023", "tradik/mddb:latest" ] } }
}
EOF
Step 6: Analyze content
Launch Claude CLI:
claude
Link analysis
> Search all posts in the malacukierenka-posts collection. Find all external links (http/https). List those that might be outdated or broken.
SEO analysis
> Analyze posts from malacukierenka-posts for SEO: - Do posts have meta descriptions? - Are titles the right length (50-60 characters)? - Do images have alt text? List issues sorted by severity.
Duplicate content
> Check the malacukierenka-posts collection. Find posts with very similar content or the same topic. List potential duplicates with similarity scores.
Category analysis
> Analyze all posts from malacukierenka-posts. What categories and tags are used? Are there posts without categories? Are there categories with only 1 post (possibly redundant)?
Outdated content
> Find posts in malacukierenka-posts that: - Are older than 2 years - Reference dates, events, or prices - May need updating Provide specific recommendations on what to update.
Content quality
> Analyze the 10 latest posts from malacukierenka-posts. Rate each on: - Length (short/medium/long) - Readability - Formatting (headings, lists, images) Provide recommendations for improvement.
Pages vs posts comparison
> Compare the content of pages (malacukierenka-pages) with posts (malacukierenka-posts). Are static pages up to date? Is information on pages consistent with post content?
Step 7 (optional): Semantic search
For better analysis, enable embeddings:
ollama pull nomic-embed-text docker stop mddb && docker rm mddb
docker run -d \ --name mddb \ -p 11023:11023 \ -v mddb-data:/data \ -e MDDB_EMBEDDING_PROVIDER=ollama \ -e MDDB_EMBEDDING_API_URL=http://host.docker.internal:11434 \ -e MDDB_EMBEDDING_MODEL=nomic-embed-text \ -e MDDB_EMBEDDING_DIMENSIONS=768 \ --add-host=host.docker.internal:host-gateway \ tradik/mddb:latest curl -X POST "http://localhost:11023/v1/vector-reindex?collection=malacukierenka-posts"
curl -X POST "http://localhost:11023/v1/vector-reindex?collection=malacukierenka-pages"
Now Claude CLI can find semantically related content โ e.g. "posts about baking cakes" without requiring exact keyword matches.
Summary
| Step | Command | What it does |
|---|---|---|
| 1 | go install .../wpexportjson | Install the exporter |
| 2 | docker run tradik/mddb | Start the database |
| 3 | wpexportjson --url ... --format markdown | Export WP to Markdown |
| 4 | load-md-folder.sh export/ collection | Load into MDDB |
| 5 | claude_desktop_config.json | Configure MCP |
| 6 | claude | Analyze content |
| 7 | ollama pull nomic-embed-text | Add semantic search |
Other WordPress sites
The same process works with any WordPress site that has a public REST API. Change the URL in step 3:
wpexportjson --url https://your-site.com --output ./export --format markdown
Example sites you can analyze:
- Company blog โ communication consistency analysis
- WooCommerce store โ product description analysis
- News portal โ article quality analysis
- Portfolio โ project presentation analysis