Geosearch

MDDB 2.9.11 ships a full geospatial search subsystem: radius and bounding-box queries, two pluggable index algorithms, composition with full-text and vector search, and a Leaflet-backed panel UI. This document describes the data model, the HTTP/MCP/gRPC surfaces, and the operational trade-offs.

Scale envelope: tested up to 100 000 points per collection with sub-millisecond R-tree queries. Fine for "venues/posts near X" workloads. Not a replacement for PostGIS on multi-million-point datasets.

Data model

Coordinates attach to documents via reserved metadata keys. The index extracts them in priority order at write time:

  1. Explicit lat/lng โ€” geo_lat + geo_lng as float64 strings (decimal degrees, WGS84). Canonical and fastest.
  2. Geohash โ€” geo_hash (1โ€“12 character geohash string). MDDB decodes it to the centroid of the cell. Useful when your upstream system already has a geohash.
  3. Postcode โ€” geo_postcode + geo_country, resolved through an opt-in in-memory postcode โ†’ (lat, lng) lookup loaded from a CSV file per country. Silent no-op if the lookup is not populated.
---
title: "Joe's Coffee"
geo_lat: "52.5200"
geo_lng: "13.4050"
--- Great espresso in the heart of Berlin.

Or, equivalently:

---
title: "Joe's Coffee"
geo_hash: "u33d8s7"
---

Or, with postcode fallback (requires geo-reindex with loadPostcodes):

---
title: "Joe's Coffee"
geo_postcode: "10115"
geo_country: "DE"
---

Reserved keys: geo_lat, geo_lng, geo_hash, geo_postcode, geo_country. Do not use these for unrelated metadata or the index will pick them up.

Index algorithms

Two algorithms share the same geo BoltDB bucket. They are two in-memory views of the same persisted data, rebuilt independently at startup and kept in sync by the write hooks in main.go. Pick between them at query time via the algorithm field on /v1/geo-search.

rtree (default)

Implementation: tidwall/rtree, a pure-Go R-tree with [2]float64 bounding boxes. Each point is stored as a zero-area bbox keyed by docID. The in-memory structure mirrors the vector index pattern: an RWMutex-protected per-collection tree plus a secondary map[docID]geoPoint so we can delete by docID without scanning.

Strong for: radius queries, bounding-box queries, moderate update frequency. Handles poles and the anti-meridian cleanly (the index does not, but haversine scoring does).

geohash

Implementation: geo_hash.go + geohash_index.go. Points are encoded at a fixed precision (geohashIndexPrecision = 8, ~40 m cell) and kept in a sorted slice per collection. Queries walk the precision down until the cell is larger than the search radius, binary-search the slice for that prefix range, and haversine-filter the candidates.

Strong for: BoltDB-native workloads that want to compose with prefix scans on the same hash. Slightly slower than the R-tree for bbox queries (falls back to a linear scan). Useful as a sanity check against the R-tree results on the same data.

The encoding is the canonical 32-char alphabet 0123456789bcdefghjkmnpqrstuvwxyz, compatible with geohash.org and most client libraries.

Endpoints

All endpoints accept JSON and return JSON. Write endpoints are gated by the usual read-only mode middleware.

POST /v1/geo-search

Radius search. Returns results sorted by ascending distance.

{ "collection": "venues", "lat": 52.52, "lng": 13.405, "radiusMeters": 5000, "topK": 10, "algorithm": "rtree", "filterMeta": {"category": ["coffee"]}
}

Response:

{ "results": [ { "document": {"id": "...", "key": "joes-coffee", "meta": {...}}, "distanceMeters": 342.7, "rank": 1 } ], "total": 1, "radiusMeters": 5000, "algorithm": "rtree"
}

POST /v1/geo-within

Axis-aligned bbox search. No ordering is applied.

{ "collection": "venues", "minLat": 52.5, "maxLat": 52.6, "minLng": 13.3, "maxLng": 13.5
}

POST /v1/geo-reindex

Force-rebuild both in-memory indexes from the persisted geo bucket, optionally loading one or more postcode CSVs first. Write-gated.

{ "collection": "venues", "loadPostcodes": [ {"country": "PL", "csvPath": "/var/lib/mddb/postcodes/pl.csv"}, {"country": "GB", "csvPath": "/var/lib/mddb/postcodes/gb.csv"} ]
}

CSV format: postcode,lat,lng (three columns, no header, UTF-8). MDDB never ships postcode datasets โ€” operators provide their own.

GET /v1/geo-stats

Per-collection point counts + loaded postcode dataset sizes.

POST /v1/geo-encode ยท POST /v1/geo-decode

Ad-hoc conversion helpers. Useful for building UIs or debugging.

// geo-encode
{"lat": 52.52, "lng": 13.405, "precision": 8}
// โ†’ {"geohash": "u33dc1j2", "precision": 8} // geo-decode
{"geohash": "u33dc1j2"}
// โ†’ {"lat": 52.5199..., "lng": 13.4049..., "minLat": ..., "maxLat": ..., "minLng": ..., "maxLng": ...}

Composition with FTS and vector search

POST /v1/hybrid-search grows an optional geo field that spatially pre-filters the FTS + vector candidate set before rank fusion. This is the easiest way to write a query like "coffee shops within 5 km of me, ranked by semantic relevance".

{ "collection": "venues", "query": "coffee", "geo": {"lat": 52.52, "lng": 13.405, "radiusMeters": 5000}, "strategy": "alpha", "alpha": 0.6
}

Each result item gains a distanceMeters field in the composed response.

GraphQL

GraphQL is not a supported protocol for geosearch in 2.9.10. The GraphQL subsystem in the project is currently a pre-existing stub โ€” every query resolver panics with not implemented โ€” and wiring it up is tracked separately. Until that follow-up PR lands, use REST, gRPC, or MCP for geo queries.

MCP tools

All geo endpoints are exposed to LLM clients via MCP. Tool names: geo_search, geo_within, geo_stats, geo_encode, geo_decode. All are annotated readOnlyHint: true and work in read-only mode.

Panel UI

The panel ships a "Geo Search" tab with a Leaflet + OpenStreetMap map. Click the map to set the query center, drag the slider to change the radius, pick the algorithm and hit Search. Results are drawn as pins and listed to the right; clicking a pin opens the document in the shared viewer.

No map-provider key is needed โ€” OpenStreetMap tiles are used directly with their public attribution. If you need a different tile source (Mapbox, Stamen, Carto), edit the tileLayer URL in services/mddb-panel/src/components/GeoPanel.jsx.

Operational notes

  • Startup latency: both indexes load asynchronously from the geo bucket. Queries return HTTP 503 until IsReady() flips. Startup time is roughly linear in the number of points; 100 000 points take ~250 ms on a modern laptop.
  • Replication: the geo bucket participates in the standard Binlog replication stream. Follower nodes receive geo upserts and deletes automatically; no extra wiring needed.
  • Memory: each point costs ~80 bytes in the R-tree plus ~40 bytes in the geohash slice. 100 k points โ‰ˆ 12 MB RSS for both indexes combined.
  • Benchmark: go test -bench BenchmarkGeoIndex -benchmem. See services/mddbd/geo_index_test.go for the harness.

Limitations

  • Anti-meridian crossing is not supported. Queries that would cross ยฑ180ยฐ longitude should be split into two halves by the caller.
  • 3D / altitude is not supported. MDDB is strictly 2D.
  • Automatic postcode downloads โ€” MDDB does not ship or fetch any postcode datasets. Bring your own CSV.
  • Scale ceiling โ€” beyond ~500 000 points per collection the in-memory R-tree starts to dominate process RSS. For bigger datasets, use PostGIS or a dedicated spatial DB.