Skip to content

Scoring Pipeline

When you call search() or retrieve(), the query goes through a multi-stage pipeline that combines vector similarity, temporal decay, associative expansion, and retrieval boosting to produce a ranked list of memories.

The query text is embedded, and the hot store is searched for the top candidates by cosine similarity:

candidates = await self.adapter.search_similar(
query_embedding, top_k=top_k * 3,
include_superseded=deep_recall,
)

The search requests 3x the desired result count to ensure enough candidates survive scoring and filtering.

Stage 1b: Lexical/BM25 search (v6, when hybrid enabled)

Section titled “Stage 1b: Lexical/BM25 search (v6, when hybrid enabled)”

When hybrid_search / hybridSearch is enabled, a parallel BM25 lexical search runs against the adapter’s search_lexical / searchLexical method:

lexical_candidates = await self.adapter.search_lexical(
query_text, top_k=k_sparse,
)

Results are unioned with the dense candidates from Stage 1. Duplicate memories (found by both paths) are kept once. This ensures exact keyword matches surface even when they aren’t the closest vectors.

Each candidate is scored with the R^alpha formula:

combined_score = sim(m,q) * R(m)^alpha

Where alpha = 0.3 (configurable). This softens the decay penalty so faded but relevant memories can still surface.

For superseded memories in deep recall mode, an additional penalty is applied:

combined_score *= deep_recall_penalty # default: 0.5

Candidates are sorted by combined score, and the top-k are selected as direct results.

Candidates with semantic type plan or transient_state whose validity window has expired are removed from the candidate pool, unless include_expired_transients=True or deep recall is active with include_expired_in_deep_recall=True. See Deep Recall — Validity-based filtering for details.

When rerank_enabled / rerankEnabled is True and an LLM provider is available, the top k_rerank candidates (default 10) from Stage 2/2b are sent to the LLM for relevance reranking. The LLM re-scores each candidate against the original query, and candidates are re-sorted by the LLM-assigned scores. This can significantly improve precision for ambiguous or nuanced queries at the cost of additional latency and token usage. The model used defaults to the configured extraction model unless rerank_model / rerankModel is set explicitly.

For each direct result, the engine looks up its association graph:

for result in direct_results:
associated = self.get_associated_memories(result.memory, now)
for assoc_mem, assoc_weight in associated:
# Score: sim * R^alpha * association_weight
...

Associated memories are scored by their own similarity and retention, multiplied by the association weight. Only associations with weight >= 0.3 (after decay) are activated.

This stage can pull in memories from cold storage — if a cold memory is linked to a direct result, it’s scored and included.

Stage 3b: Graph expansion + bridge discovery (v6)

Section titled “Stage 3b: Graph expansion + bridge discovery (v6)”

After associative expansion, the engine performs a second hop through the association graph looking for bridge memories — nodes that connect otherwise unrelated clusters. Bridge memories are scored and included if they exceed a minimum relevance threshold, enabling cross-topic recall.

All memories in the result set receive retrieval boosts:

  • Direct results: stability += 0.1 * spaced_rep_factor
  • Associative results: stability += 0.03 * spaced_rep_factor

Both types also get: access_count += 1, last_accessed_at = now, session ID added.

Cold memories that were retrieved via associations are migrated back to hot storage.

Every memory in the result set is checked against the core promotion criteria:

for result in all_results:
self.check_core_promotion(result.memory)

If a memory has access_count >= 10, stability >= 0.85, and sessions >= 3, it’s promoted to core.

All pairs of direct results have their association links strengthened:

for i in range(len(direct_mems)):
for j in range(i + 1, len(direct_mems)):
self.strengthen_association(direct_mems[i], direct_mems[j], now)

This creates new associations (or strengthens existing ones) between memories that co-occur in search results. Over time, this builds a rich associative graph that reflects query patterns.

Direct and associative results are merged, deduplicated by memory ID, sorted by combined score, and truncated to top-k:

all_results = direct_results + associative_results
all_results.sort(key=lambda x: x.combined_score, reverse=True)
return all_results[:top_k]

What the caller receives (v6 breaking change)

Section titled “What the caller receives (v6 breaking change)”

In v6, search() returns a SearchResponse wrapper instead of a bare list. This is a breaking change from v5.

@dataclass
class SearchResponse:
results: list[SearchResult] # ranked results (same as before)
trace: SearchTrace # pipeline instrumentation
@dataclass
class SearchResult:
memory: Memory
relevance_score: float
retention_score: float
combined_score: float
is_associative: bool
via_deep_recall: bool
@dataclass
class SearchTrace:
stages: list[StageTrace]
total_ms: float
total_tokens: int # sum of all prompt + completion tokens across stages
@dataclass
class StageTrace:
name: str # e.g. "vector_search", "bm25_search", "validity_filter", "rerank"
candidate_count: int # candidates entering this stage
survivor_count: int # candidates leaving this stage
duration_ms: float
prompt_tokens: int # LLM prompt tokens used (0 for non-LLM stages)
completion_tokens: int # LLM completion tokens used (0 for non-LLM stages)

The SearchTrace attached to every response provides full visibility into the pipeline. Each StageTrace records the stage name, how many candidates entered and survived, and wall-clock time. This is useful for debugging relevance issues, tuning top_k multipliers, and understanding why specific memories did or did not appear.

response = await mem.search("coffee preferences")
for stage in response.trace.stages:
print(f"{stage.name}: {stage.candidate_count} -> {stage.survivor_count} ({stage.duration_ms:.1f}ms)")
Query text
|
v
Embed query
|
v
Stage 1: Vector search (hot store, top_k * 3)
Stage 1b: BM25 lexical search (if hybrid enabled)
|
v
Stage 2: Score by sim * R^alpha (sort, take top_k)
Stage 2b: Validity filtering (expired plan/transient exclusion)
Stage 2c: LLM reranking (if rerank_enabled)
|
v
Stage 3: Expand via associations (cold store too)
Stage 3b: Graph expansion + bridge discovery
|
v
Stage 4: Apply direct/associative boosts
|
v
Stage 5: Check core promotions
|
v
Stage 6: Strengthen co-retrieved associations
|
v
Merge, sort, return SearchResponse (results + trace)

Every search actively reshapes the memory system: boosting stabilities, creating/strengthening associations, promoting cores, and reactivating cold memories. The system gets smarter with every query.