FTS Algorithm Benchmark

Auto-generated by test/search-benchmark.sh on 2026-03-05 01:13:39

Environment

ParameterValue
OSDarwin 25.3.0 arm64
Gogo1.26.0
Documents10000
Queries10 diverse queries
Iterations10 per query per algorithm
Runs10 (benchmark repeated 10x, results aggregated)
Total searches1000 per algorithm config
Warmup5 queries per run (discarded)

Algorithms

AlgorithmDescription
tfidfClassic TF-IDF term frequency scoring
bm25Okapi BM25 probabilistic ranking with length normalization
bm25fBM25F field-weighted scoring (title, meta, content)
pmisparseBM25 + PMI query expansion (invented by Tradik Limited)
+fuzzyLevenshtein distance 1 fuzzy matching variant

Latency Results

AlgorithmAvg (ms)P50 (ms)P95 (ms)P99 (ms)Min (ms)Max (ms)QPS
tfidf11.4411.0912.9817.609.8429.0287
tfidf+fuzzy24.4423.7127.3435.5422.39102.6040
bm2513.4012.0716.1335.6810.04224.2074
bm25+fuzzy26.3124.8730.4440.5623.03254.5738
bm25f13.0812.5916.2418.8810.7126.7676
bm25f+fuzzy27.0925.9429.7237.4024.31230.3836
pmisparse18.4316.7429.9234.3013.1976.3154
pmisparse+fuzzy30.5728.8741.2243.9825.8459.3132

Average Latency Comparison

xychart-beta title "Average Search Latency (ms) โ€” lower is better" x-axis ["tfidf", "tfidf+f", "bm25", "bm25+f", "bm25f", "bm25f+f", "pmisparse", "pmisparse+f"] y-axis "Latency (ms)" bar [11.44, 24.44, 13.40, 26.31, 13.08, 27.09, 18.43, 30.57]

Throughput Comparison

xychart-beta title "Search Throughput (queries/sec) โ€” higher is better" x-axis ["tfidf", "tfidf+f", "bm25", "bm25+f", "bm25f", "bm25f+f", "pmisparse", "pmisparse+f"] y-axis "QPS" bar [87, 40, 74, 38, 76, 36, 54, 32]

Result Counts per Query

Shows how many documents each algorithm returns (limit=10) to verify they all find relevant results.

Querytfidfbm25bm25fpmisparse
kubernetes deployment cluster10101010
neural network training10101010
database query optimization10101010
machine learning model10101010
security authentication token10101010
cloud infrastructure scaling10101010
data pipeline processing10101010
api gateway middleware10101010
distributed consensus protocol10101010
search algorithm ranking10101010

Notes

  • tfidf: Fastest for simple keyword matching. No length normalization.
  • bm25: Slightly more compute than tfidf due to document length normalization. Best general-purpose algorithm.
  • bm25f: Adds field-level weighting. Slower due to separate field index lookups.
  • pmisparse: First search triggers lazy PMI matrix training (not included in benchmark). Subsequent searches include PMI expansion overhead.
  • fuzzy: Adds Levenshtein distance computation. Expected ~2-3x slower than exact matching.
  • All benchmarks run on a warm server with FTS indices already built during document insertion.

Benchmark tool for measuring MDDB document insertion throughput. Inserts documents in configurable batches, records timing per batch, and generates an HTML report with an SVG chart.

Prerequisites

  • MDDB server running (default http://localhost:7890)
  • Go 1.26+

Build

cd tools/bench
go build -o mddb-bench .

Usage

cd services/mddbd
go run . -db /tmp/bench.db cd tools/bench
./mddb-bench ./mddb-bench -total 5000 -batch 50 -collection mybench -output results.html ./mddb-bench -total 1000 -cleanup

Flags

FlagDefaultDescription
-urlhttp://localhost:7890MDDB server URL
-collectionbenchCollection to insert into
-total10000Total documents to insert
-batch100Batch size for timing measurements
-outputbench_report.htmlHTML report output path
-cleanupfalseDelete collection after benchmark

What It Measures

Each document is a simulated blog post with:

  • Random title (3-6 words)
  • Random tags (1-3 from a pool of 20)
  • Random author
  • 2-5 paragraphs of lorem ipsum (~500-2000 characters)

Documents are inserted one-by-one via POST /v1/add. Every batch of N documents is timed and throughput is calculated.

Results (2026-03-06)

Environment: Darwin 25.3.0 arm64, Go 1.26.0, sequential POST /v1/add (one doc at a time)

Summary

MetricValue
Total documents10,000
Total time5m 34s
Avg throughput30 docs/sec
Min batch11 docs/sec
Max batch49 docs/sec
Batch size100

Throughput per Batch

Docsdocs/secCum. avgNotes
1004949Cold start, fastest batch
5003841
1,0003739Stable ~37-39 range
2,0003638
2,5002235Degradation begins (FTS index growth)
3,0002533
4,0001129Worst batch โ€” likely BoltDB compaction
5,0002427
6,0003727Recovery after compaction
7,0003728Stabilized ~35-38
8,0003329
9,0003529
10,0003730Final average: 30 docs/sec

Throughput Chart

xychart-beta title "Insert Throughput (docs/sec per 100-doc batch)" x-axis ["1K", "2K", "3K", "4K", "5K", "6K", "7K", "8K", "9K", "10K"] y-axis "docs/sec" 0 --> 55 bar [37, 36, 25, 11, 24, 37, 37, 33, 35, 37] line [39, 38, 33, 29, 27, 27, 28, 29, 29, 30]

Observations

  • 0-2K docs: Stable ~37-49 docs/sec. BoltDB is small, FTS index fits comfortably.
  • 2K-5K docs: Throughput drops to 11-25 docs/sec. FTS token index grows, BoltDB page splits and fsync become expensive.
  • 5K-10K docs: Recovery to ~33-38 docs/sec. BoltDB has compacted and stabilized at a larger page count.
  • Batch 40 dip (4000 docs): 9.3s for 100 docs (11 docs/sec) โ€” classic BoltDB B+ tree rebalancing spike.
  • Each insert includes: JSON decode, BoltDB write, FTS tokenization + index update, revision tracking, checksum computation.

How to Run

cd tools/bench
go build -o mddb-bench .
./mddb-bench -url http://localhost:7890 -total 10000 -batch 100 -output bench_report.html -cleanup

Flags

FlagDefaultDescription
-urlhttp://localhost:7890MDDB server URL
-collectionbenchCollection to insert into
-total10000Total documents to insert
-batch100Batch size for timing measurements
-outputbench_report.htmlHTML report output path
-cleanupfalseDelete collection after benchmark

HTML Report

The tool also generates a self-contained HTML report with an interactive SVG bar chart, cumulative average trend line, and detailed per-batch table. Open in any browser โ€” no external dependencies.