Scoring Pipeline
Overview
Section titled “Overview”When you call search() or retrieve(), the query goes through a multi-stage pipeline that combines vector similarity, temporal decay, associative expansion, and retrieval boosting to produce a ranked list of memories.
The six stages
Section titled “The six stages”Stage 1: Vector similarity search
Section titled “Stage 1: Vector similarity search”The query text is embedded, and the hot store is searched for the top candidates by cosine similarity:
candidates = await self.adapter.search_similar( query_embedding, top_k=top_k * 3, include_superseded=deep_recall,)The search requests 3x the desired result count to ensure enough candidates survive scoring and filtering.
Stage 1b: Lexical/BM25 search (v6, when hybrid enabled)
Section titled “Stage 1b: Lexical/BM25 search (v6, when hybrid enabled)”When hybrid_search / hybridSearch is enabled, a parallel BM25 lexical search runs against the adapter’s search_lexical / searchLexical method:
lexical_candidates = await self.adapter.search_lexical( query_text, top_k=k_sparse,)Results are unioned with the dense candidates from Stage 1. Duplicate memories (found by both paths) are kept once. This ensures exact keyword matches surface even when they aren’t the closest vectors.
Stage 2: Retention-weighted scoring
Section titled “Stage 2: Retention-weighted scoring”Each candidate is scored with the R^alpha formula:
combined_score = sim(m,q) * R(m)^alphaWhere alpha = 0.3 (configurable). This softens the decay penalty so faded but relevant memories can still surface.
For superseded memories in deep recall mode, an additional penalty is applied:
combined_score *= deep_recall_penalty # default: 0.5Candidates are sorted by combined score, and the top-k are selected as direct results.
Stage 2b: Validity filtering (v6)
Section titled “Stage 2b: Validity filtering (v6)”Candidates with semantic type plan or transient_state whose validity window has expired are removed from the candidate pool, unless include_expired_transients=True or deep recall is active with include_expired_in_deep_recall=True. See Deep Recall — Validity-based filtering for details.
Stage 2c: LLM reranking (optional)
Section titled “Stage 2c: LLM reranking (optional)”When rerank_enabled / rerankEnabled is True and an LLM provider is available, the top k_rerank candidates (default 10) from Stage 2/2b are sent to the LLM for relevance reranking. The LLM re-scores each candidate against the original query, and candidates are re-sorted by the LLM-assigned scores. This can significantly improve precision for ambiguous or nuanced queries at the cost of additional latency and token usage. The model used defaults to the configured extraction model unless rerank_model / rerankModel is set explicitly.
Stage 3: Associative expansion
Section titled “Stage 3: Associative expansion”For each direct result, the engine looks up its association graph:
for result in direct_results: associated = self.get_associated_memories(result.memory, now) for assoc_mem, assoc_weight in associated: # Score: sim * R^alpha * association_weight ...Associated memories are scored by their own similarity and retention, multiplied by the association weight. Only associations with weight >= 0.3 (after decay) are activated.
This stage can pull in memories from cold storage — if a cold memory is linked to a direct result, it’s scored and included.
Stage 3b: Graph expansion + bridge discovery (v6)
Section titled “Stage 3b: Graph expansion + bridge discovery (v6)”After associative expansion, the engine performs a second hop through the association graph looking for bridge memories — nodes that connect otherwise unrelated clusters. Bridge memories are scored and included if they exceed a minimum relevance threshold, enabling cross-topic recall.
Stage 4: Retrieval boosting
Section titled “Stage 4: Retrieval boosting”All memories in the result set receive retrieval boosts:
- Direct results:
stability += 0.1 * spaced_rep_factor - Associative results:
stability += 0.03 * spaced_rep_factor
Both types also get: access_count += 1, last_accessed_at = now, session ID added.
Cold memories that were retrieved via associations are migrated back to hot storage.
Stage 5: Core promotion check
Section titled “Stage 5: Core promotion check”Every memory in the result set is checked against the core promotion criteria:
for result in all_results: self.check_core_promotion(result.memory)If a memory has access_count >= 10, stability >= 0.85, and sessions >= 3, it’s promoted to core.
Stage 6: Co-retrieval strengthening
Section titled “Stage 6: Co-retrieval strengthening”All pairs of direct results have their association links strengthened:
for i in range(len(direct_mems)): for j in range(i + 1, len(direct_mems)): self.strengthen_association(direct_mems[i], direct_mems[j], now)This creates new associations (or strengthens existing ones) between memories that co-occur in search results. Over time, this builds a rich associative graph that reflects query patterns.
Final merge and return
Section titled “Final merge and return”Direct and associative results are merged, deduplicated by memory ID, sorted by combined score, and truncated to top-k:
all_results = direct_results + associative_resultsall_results.sort(key=lambda x: x.combined_score, reverse=True)return all_results[:top_k]What the caller receives (v6 breaking change)
Section titled “What the caller receives (v6 breaking change)”In v6, search() returns a SearchResponse wrapper instead of a bare list. This is a breaking change from v5.
@dataclassclass SearchResponse: results: list[SearchResult] # ranked results (same as before) trace: SearchTrace # pipeline instrumentation
@dataclassclass SearchResult: memory: Memory relevance_score: float retention_score: float combined_score: float is_associative: bool via_deep_recall: bool
@dataclassclass SearchTrace: stages: list[StageTrace] total_ms: float total_tokens: int # sum of all prompt + completion tokens across stages
@dataclassclass StageTrace: name: str # e.g. "vector_search", "bm25_search", "validity_filter", "rerank" candidate_count: int # candidates entering this stage survivor_count: int # candidates leaving this stage duration_ms: float prompt_tokens: int # LLM prompt tokens used (0 for non-LLM stages) completion_tokens: int # LLM completion tokens used (0 for non-LLM stages)interface SearchResponse { results: ScoredMemory[]; trace: SearchTrace;}
interface ScoredMemory extends Memory { relevanceScore: number; finalScore: number;}
interface SearchTrace { stages: StageTrace[]; totalMs: number; totalTokens: number; // sum of all prompt + completion tokens across stages}
interface StageTrace { name: string; candidateCount: number; survivorCount: number; durationMs: number; promptTokens: number; // LLM prompt tokens used (0 for non-LLM stages) completionTokens: number; // LLM completion tokens used (0 for non-LLM stages)}Trace / instrumentation
Section titled “Trace / instrumentation”The SearchTrace attached to every response provides full visibility into the pipeline. Each StageTrace records the stage name, how many candidates entered and survived, and wall-clock time. This is useful for debugging relevance issues, tuning top_k multipliers, and understanding why specific memories did or did not appear.
response = await mem.search("coffee preferences")for stage in response.trace.stages: print(f"{stage.name}: {stage.candidate_count} -> {stage.survivor_count} ({stage.duration_ms:.1f}ms)")Pipeline summary
Section titled “Pipeline summary”Query text | vEmbed query | vStage 1: Vector search (hot store, top_k * 3)Stage 1b: BM25 lexical search (if hybrid enabled) | vStage 2: Score by sim * R^alpha (sort, take top_k)Stage 2b: Validity filtering (expired plan/transient exclusion)Stage 2c: LLM reranking (if rerank_enabled) | vStage 3: Expand via associations (cold store too)Stage 3b: Graph expansion + bridge discovery | vStage 4: Apply direct/associative boosts | vStage 5: Check core promotions | vStage 6: Strengthen co-retrieved associations | vMerge, sort, return SearchResponse (results + trace)Every search actively reshapes the memory system: boosting stabilities, creating/strengthening associations, promoting cores, and reactivating cold memories. The system gets smarter with every query.