Smart Search v0.12.0 -- multi-stage hybrid retrieval with cross-encoder reranking and MMR diversity
Raw query from MCP tool, REST API, CLI, or desktop Quick Search. Normalized: leading/trailing punctuation stripped, internal punctuation preserved.
Splits into two paths. FTS5 path: removes stopwords, OR-joins multi-term queries, preserves quoted phrases. Embedding path: whitespace normalization only -- the embedding model handles semantic context.
Lexical retrieval via SQLite FTS5. Porter stemming handles inflections. BM25 ranks by term frequency, inverse document frequency, and length normalization.
Dense retrieval via LanceDB. Query embedded by snowflake-arctic-embed-m-v2.0 (256-dim, int8 ONNX), compared to indexed chunk embeddings via cosine similarity.
Merges both ranked lists without requiring score calibration. Documents appearing in both lists receive scores from both, naturally boosting consensus results. Scores normalized to 0-1.
Jointly scores each (query, chunk) pair through a cross-encoder transformer. Unlike the bi-encoder (stage 2b) which embeds query and document independently, the cross-encoder sees both texts together with full attention, catching subtle relevance signals. Reranks the top-20 fusion results.
Maximum Marginal Relevance eliminates redundant results. Greedily selects the next result that maximizes relevance while penalizing similarity to already-selected results. Ensures 10 results cover 10 topics, not 3.
Top-N results with rank, normalized score (0-1), chunk text, source path, and section path. Delivered via MCP tools, REST API, CLI, or desktop Quick Search.
| Component | RAM (active) | RAM (idle) | Latency | On Disk |
|---|---|---|---|---|
| Bi-encoder (snowflake) | ~400 MB | 0 (unloads) | ~50ms | 297 MB |
| Cross-encoder (TinyBERT) | ~50 MB | 0 (unloads) | 30-60ms | ~15 MB |
| FTS5 index | Shared with SQLite | <5ms | ~ corpus size | |
| RRF + MMR | negligible | 0 | <2ms | 0 |
| Total (search) | ~650 MB peak | ~200 MB | ~100-150ms | ~312 MB |