Bulk Import Guide
Overview
The load-md-folder.sh script allows you to bulk import markdown files from a folder into MDDB. It's perfect for migrating existing documentation, importing blog posts, or loading large collections of markdown content.
Features
- Automatic Key Generation - Creates unique keys from filenames
- Frontmatter Support - Extracts YAML-style metadata from file headers
- Recursive Scanning - Process entire directory trees
- Progress Tracking - Real-time progress with statistics
- Dry Run Mode - Preview imports without making changes
- Error Handling - Graceful failure handling with detailed reporting
- Metadata Enrichment - Add custom metadata to all imported files
- Multi-language Support - Specify language code for all documents
Installation
The script is located in the scripts/ directory and requires:
- Bash shell
mddb-clicommand available in PATH- Running MDDB server
chmod +x scripts/load-md-folder.sh
Basic Usage
Simple Import
Import all .md files from a folder:
./scripts/load-md-folder.sh ./docs blog
This will:
- Scan
./docsfor.mdfiles - Import them into the
blogcollection - Use default language
en_US - Generate keys from filenames
Recursive Import
Process all subfolders:
./scripts/load-md-folder.sh ./content articles --recursive
Or use the short form:
./scripts/load-md-folder.sh ./content articles -r
Custom Language
Specify a different language code:
./scripts/load-md-folder.sh ./docs-pl blog --lang pl_PL
Short form:
./scripts/load-md-folder.sh ./docs-pl blog -l pl_PL
Advanced Usage
Adding Metadata
Add custom metadata to all imported files:
./scripts/load-md-folder.sh ./posts blog \ --meta "author=John Doe" \ --meta "status=published" \ --meta "category=tutorial"
Short form:
./scripts/load-md-folder.sh ./posts blog \ -m "author=John Doe" \ -m "status=published"
Dry Run
Preview what would be imported without making changes:
./scripts/load-md-folder.sh ./docs blog --dry-run
This shows:
- Which files would be imported
- Generated keys
- Extracted metadata
- Final metadata combination
Verbose Output
See detailed information during import:
./scripts/load-md-folder.sh ./docs blog --verbose
Shows:
- Each file being processed
- Generated key for each file
- Metadata for each file
- Success/failure status
Custom Server
Connect to a different MDDB server:
./scripts/load-md-folder.sh ./docs blog \ --server http://production-server:11023
Or use environment variable:
MDDB_SERVER=http://production-server:11023 \ ./scripts/load-md-folder.sh ./docs blog
Batch Size
Control progress update frequency:
./scripts/load-md-folder.sh ./docs blog --batch-size 50
Default is 10 files per progress update.
Frontmatter Support
The script automatically extracts metadata from YAML-style frontmatter:
---
title: Getting Started
author: John Doe
tags: tutorial, beginner
category: documentation
date: 2024-01-15
--- Your content here...
This frontmatter will be converted to metadata:
title=Getting Startedauthor=John Doetags=tutorial, beginnercategory=documentationdate=2024-01-15
Frontmatter Format
Supported format:
---
key: value
another_key: another value
tags: value1, value2
---
Requirements:
- Must start with
---on first line - Must end with
---on its own line - Use
key: valueformat - Values can contain spaces (quotes optional)
Key Generation
Keys are automatically generated from filenames:
| Filename | Generated Key |
|---|---|
Getting Started.md | getting-started |
API_Reference.md | api-reference |
2024-01-15-blog-post.md | 2024-01-15-blog-post |
My Document (v2).md | my-document-v2 |
Rules:
- Convert to lowercase
- Replace spaces and special characters with hyphens
- Remove consecutive hyphens
- Trim leading/trailing hyphens
Metadata Combination
Metadata is combined from multiple sources:
Automatic metadata:
source=folder-importfilename=original-filename.md
Frontmatter metadata (extracted from file)
Custom metadata (from
--metaflags)
Example:
---
author: Jane
category: tutorial
--- ./scripts/load-md-folder.sh ./docs blog -m "status=published" source=folder-import,filename=tutorial.md,author=Jane,category=tutorial,status=published
Examples
Migrate Documentation
./scripts/load-md-folder.sh ./docs documentation \ --recursive \ --meta "version=2.0" \ --meta "status=published" \ --verbose
Import Blog Posts
./scripts/load-md-folder.sh ./blog-posts blog \ --lang en_US \ --meta "author=John Doe" \ --meta "type=blog-post"
Multi-language Content
./scripts/load-md-folder.sh ./content/en articles -l en_US -r ./scripts/load-md-folder.sh ./content/pl articles -l pl_PL -r ./scripts/load-md-folder.sh ./content/de articles -l de_DE -r
Preview Before Import
./scripts/load-md-folder.sh ./docs blog --dry-run ./scripts/load-md-folder.sh ./docs blog
Large Import with Progress
./scripts/load-md-folder.sh ./large-docs blog \ --recursive \ --batch-size 100 \ --verbose
Output
Progress Display
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ MDDB Folder Loader
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Checking server connectivity...
โ Server is running Configuration: Folder: ./docs Collection: blog Language: en_US Server: http://localhost:11023 Recursive: true Scanning for markdown files...
Found 150 markdown file(s) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Loading Files
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Progress: [########################## ] 52% (78/150 files)
Summary
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Summary
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Results: Total files: 150 Successful: 148 Failed: 2 Duration: 45s Throughput: 3.29 files/sec โ Import completed with some failures
Error Handling
Common Errors
Server not running:
โ Cannot connect to MDDB server at http://localhost:11023 Make sure the server is running
Folder not found:
Error: Folder does not exist: ./nonexistent
No markdown files:
No markdown files found in ./empty-folder
Failed Imports
If some files fail to import:
- The script continues processing remaining files
- Failed files are counted in the summary
- Exit code is 1 (failure) if any imports failed
- Exit code is 0 (success) if all imports succeeded
Environment Variables
| Variable | Description | Default |
|---|---|---|
MDDB_SERVER | Server URL | http://localhost:11023 |
MDDB_CLI | CLI command path | mddb-cli |
Example:
export MDDB_SERVER=http://production:11023
export MDDB_CLI=/usr/local/bin/mddb-cli ./scripts/load-md-folder.sh ./docs blog
Performance Tips
Batch Size: Increase for large imports to reduce output
./scripts/load-md-folder.sh ./docs blog -b 100Disable Verbose: For faster imports
./scripts/load-md-folder.sh ./docs blogUse Extreme Mode: Enable on server for better performance
MDDB_EXTREME=true mddbdLocal Server: Import to local server, then backup/restore to production
Troubleshooting
Script not executable
chmod +x scripts/load-md-folder.sh
CLI not found
make build-cli
make install-all MDDB_CLI=/path/to/mddb-cli ./scripts/load-md-folder.sh ./docs blog
Server connection refused
mddb-cli stats make docker-up
make run
Frontmatter not parsed
Ensure frontmatter format:
- Starts with
---on line 1 - Ends with
---on its own line - Uses
key: valueformat
Integration with CI/CD
GitHub Actions
name: Import Documentation on: push: paths: - 'docs/**/*.md' jobs: import: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Install MDDB CLI run: | wget https://github.com/tradik/mddb/releases/latest/download/mddb-cli-latest-linux-amd64.tar.gz tar xzf mddb-cli-latest-linux-amd64.tar.gz sudo mv mddb-cli /usr/local/bin/ - name: Import Documentation env: MDDB_SERVER: ${{ secrets.MDDB_SERVER }} run: | ./scripts/load-md-folder.sh ./docs documentation -r -m "version=${{ github.sha }}"
GitLab CI
import-docs: stage: deploy script: - chmod +x scripts/load-md-folder.sh - ./scripts/load-md-folder.sh ./docs documentation -r only: - main
Best Practices
- Always dry run first on production data
- Use meaningful collection names that reflect content type
- Add version metadata for tracking changes
- Use recursive mode for organized folder structures
- Include frontmatter in markdown files for rich metadata
- Test with small batches before large imports
- Monitor server resources during large imports
- Backup database before major imports
Async Bulk Ingest with Job Tracking
For imports of tens of thousands of documents where you don't want the HTTP
request to block, use the async job API instead of /v1/add-batch.
Submit a job
curl -X POST http://localhost:11023/v1/bulk-ingest-job \ -H 'Content-Type: application/json' \ -d '{ "collection": "articles", "documents": [ ... up to N documents ... ], "callbackUrl": "https://example.com/webhook" }'
Returns HTTP 202 with the job record:
{ "id": "bulk_1776539903516445000_849876", "collection": "articles", "status": "pending", "total": 12500, "submittedAt": 1713432000
}
Poll for progress
curl http://localhost:11023/v1/bulk-ingest-job/bulk_1776539903516445000_849876
Response fields:
| Field | Description |
|---|---|
status | pending, processing, completed, failed, or cancelled |
total | Documents submitted |
processed | Documents processed so far |
added / updated / failed | Per-document outcome counters |
errors | Up to 50 error messages captured during processing |
startedAt / completedAt | Unix timestamps |
List all jobs
curl http://localhost:11023/v1/bulk-ingest-jobs?collection=articles
Cancel a pending job
curl -X DELETE http://localhost:11023/v1/bulk-ingest-job/bulk_1776539903516445000_849876
In-flight jobs (processing) cannot be cancelled โ they run to completion.
How it works
- Single worker, FIFO โ BoltDB writes are serialised, so the job worker runs one job at a time. Multiple submits queue up.
- Chunked commits โ documents are processed in sub-batches of 500 so a long write transaction never blocks readers for more than a few seconds.
- In-memory payload, persistent status โ the document list lives in the
queue; if the server restarts mid-job the status record flips to
failed. - Callback webhook โ set
callbackUrlto receive a POST with the final job record when processing completes (X-MDDB-Event: bulk_ingest.completed).
When to use async vs /v1/add-batch
Use /v1/add-batch | Use /v1/bulk-ingest-job |
|---|---|
| Up to ~1000 docs per request | Thousands of docs, long-running |
| You need the response inline | You can poll or accept a callback |
| CLI one-shot imports | Pipeline-driven or UI-driven imports |