WordPress Website Analyzer

Export content from a WordPress website into MDDB, then analyze it with Claude CLI โ€” find broken links, missing meta tags, duplicate content, and other issues.

WordPress โ†’ wpexportjson โ†’ MDDB โ†’ Claude CLI (MCP) โ†’ analysis

This guide uses www.malacukierenka.pl as an example.

Requirements


Step 1: Install wpexportjson

go install github.com/tradik/wpexporter/cmd/wpexportjson@latest

Verify the installation:

wpexportjson --help

If you don't have Go, download a pre-built binary from releases.


Step 2: Start MDDB

docker run -d \ --name mddb \ -p 11023:11023 \ -v mddb-data:/data \ tradik/mddb:latest

Verify:

curl http://localhost:11023/v1/health

Step 3: Export content from WordPress

3a. Export public content (no login required)

wpexportjson fetches content through the public WordPress REST API:

wpexportjson \ --url https://www.malacukierenka.pl \ --output ./export \ --format markdown

This exports:

  • Posts (blog articles)
  • Static pages
  • Categories and tags
  • Authors

The .md files are saved to the ./export/ folder.

3b. Check what was downloaded

ls -la export/
ls export/posts/ | head -10
ls export/pages/ | head -10

Preview a file:

cat export/posts/$(ls export/posts/ | head -1) | head -30

Each file includes YAML frontmatter with metadata (title, date, categories, tags, author).


Step 4: Load content into MDDB

4a. Load posts and pages

curl -sO https://raw.githubusercontent.com/tradik/mddb/main/scripts/load-md-folder.sh
chmod +x load-md-folder.sh ./load-md-folder.sh export/posts/ malacukierenka-posts --lang pl_PL --verbose ./load-md-folder.sh export/pages/ malacukierenka-pages --lang pl_PL --verbose

4b. Alternative โ€” use the CLI

for md_file in export/posts/*.md; do key=$(basename "$md_file" .md) mddb-cli add malacukierenka-posts "$key" pl -f "$md_file" \ -m "source=wordpress,site=malacukierenka.pl,type=post"
done for md_file in export/pages/*.md; do key=$(basename "$md_file" .md) mddb-cli add malacukierenka-pages "$key" pl -f "$md_file" \ -m "source=wordpress,site=malacukierenka.pl,type=page"
done

4c. Verify the result

curl -s http://localhost:11023/v1/stats | python3 -m json.tool

You should see malacukierenka-posts and malacukierenka-pages collections with documents.


Step 5: Configure Claude CLI (MCP)

Create the MCP config file:

cat > ~/Library/Application\ Support/Claude/claude_desktop_config.json << 'EOF'
{ "mcpServers": { "mddb": { "command": "docker", "args": [ "run", "-i", "--rm", "--network", "host", "-e", "MDDB_MCP_STDIO=true", "-e", "MDDB_SERVER=http://localhost:11023", "tradik/mddb:latest" ] } }
}
EOF

Step 6: Analyze content

Launch Claude CLI:

claude

Link analysis

> Search all posts in the malacukierenka-posts collection. Find all external links (http/https). List those that might be outdated or broken.

SEO analysis

> Analyze posts from malacukierenka-posts for SEO: - Do posts have meta descriptions? - Are titles the right length (50-60 characters)? - Do images have alt text? List issues sorted by severity.

Duplicate content

> Check the malacukierenka-posts collection. Find posts with very similar content or the same topic. List potential duplicates with similarity scores.

Category analysis

> Analyze all posts from malacukierenka-posts. What categories and tags are used? Are there posts without categories? Are there categories with only 1 post (possibly redundant)?

Outdated content

> Find posts in malacukierenka-posts that: - Are older than 2 years - Reference dates, events, or prices - May need updating Provide specific recommendations on what to update.

Content quality

> Analyze the 10 latest posts from malacukierenka-posts. Rate each on: - Length (short/medium/long) - Readability - Formatting (headings, lists, images) Provide recommendations for improvement.

Pages vs posts comparison

> Compare the content of pages (malacukierenka-pages) with posts (malacukierenka-posts). Are static pages up to date? Is information on pages consistent with post content?

Step 7 (optional): Semantic search

For better analysis, enable embeddings:

ollama pull nomic-embed-text docker stop mddb && docker rm mddb
docker run -d \ --name mddb \ -p 11023:11023 \ -v mddb-data:/data \ -e MDDB_EMBEDDING_PROVIDER=ollama \ -e MDDB_EMBEDDING_API_URL=http://host.docker.internal:11434 \ -e MDDB_EMBEDDING_MODEL=nomic-embed-text \ -e MDDB_EMBEDDING_DIMENSIONS=768 \ --add-host=host.docker.internal:host-gateway \ tradik/mddb:latest curl -X POST "http://localhost:11023/v1/vector-reindex?collection=malacukierenka-posts"
curl -X POST "http://localhost:11023/v1/vector-reindex?collection=malacukierenka-pages"

Now Claude CLI can find semantically related content โ€” e.g. "posts about baking cakes" without requiring exact keyword matches.


Summary

StepCommandWhat it does
1go install .../wpexportjsonInstall the exporter
2docker run tradik/mddbStart the database
3wpexportjson --url ... --format markdownExport WP to Markdown
4load-md-folder.sh export/ collectionLoad into MDDB
5claude_desktop_config.jsonConfigure MCP
6claudeAnalyze content
7ollama pull nomic-embed-textAdd semantic search

Other WordPress sites

The same process works with any WordPress site that has a public REST API. Change the URL in step 3:

wpexportjson --url https://your-site.com --output ./export --format markdown

Example sites you can analyze:

  • Company blog โ€” communication consistency analysis
  • WooCommerce store โ€” product description analysis
  • News portal โ€” article quality analysis
  • Portfolio โ€” project presentation analysis