Skip to content

Retrieval Scoring

When you search for memories, each candidate is scored by combining semantic similarity with temporal retention:

score(m, q) = sim(m, q) * R(m)^alpha

Where:

  • sim(m, q) = cosine similarity between the memory’s embedding and the query embedding
  • R(m) = current retention of the memory (from the decay model)
  • alpha = retention exponent (default: 0.3)

Using raw retention (alpha = 1.0) would aggressively suppress faded memories. A memory at 20% retention would have its relevance score cut by 80%. This is too harsh — a highly relevant but slightly faded memory should still surface.

The exponent alpha = 0.3 softens the decay penalty:

Retention RR^1.0 (raw)R^0.3 (softened)
1.001.001.00
0.800.800.93
0.600.600.86
0.400.400.76
0.200.200.62
0.100.100.50
0.020.020.31

With alpha = 0.3, a memory at 20% retention still retains 62% of its similarity score. This allows old but highly relevant memories to outrank recent but marginally relevant ones.

Consider two memories being scored against a query:

Memory A — Recent, moderate relevance:

  • Cosine similarity: 0.6
  • Retention: 0.95
  • Score = 0.6 * 0.95^0.3 = 0.6 * 0.985 = 0.591

Memory B — Old, high relevance:

  • Cosine similarity: 0.9
  • Retention: 0.25
  • Score = 0.9 * 0.25^0.3 = 0.9 * 0.660 = 0.594

Memory B wins despite being much older because its semantic relevance is strong enough to overcome the decay penalty. With raw retention (alpha=1.0), Memory B would score only 0.225 — a massive loss of signal.

In v6, retrieval can combine dense (cosine similarity) search with BM25 lexical search for better coverage. Dense search excels at semantic similarity but can miss exact keyword matches; BM25 catches those.

When hybrid search is enabled, both a dense vector search and a BM25 lexical search are performed. The candidate sets are unioned — any memory found by either method enters the scoring pipeline. Scores from both sources are combined using reciprocal rank fusion.

PythonTypeScriptDescription
hybrid_searchhybridSearchEnable hybrid dense + BM25 retrieval (default: False)
k_sparsekSparseNumber of candidates from BM25 search (default: same as top_k)
config = CognitiveMemoryConfig(
hybrid_search=True,
k_sparse=20,
)

BM25 search only activates when the storage adapter implements the search_lexical (Python) / searchLexical (TypeScript) method. If the adapter does not implement it, hybrid search silently falls back to dense-only retrieval.

After hybrid retrieval scoring, an optional LLM reranking step can further refine results. When rerank_enabled is True and an LLM is available, the top k_rerank candidates (default 10) are sent to the LLM, which re-scores them for relevance against the original query. This is especially useful for ambiguous queries where vector similarity alone may not capture intent. See the Scoring Pipeline for details on where this fits in the pipeline.

When deep recall is enabled, superseded memories (originals that were consolidated into summaries) can appear in results. These receive an additional penalty:

score_deep = score * deep_recall_penalty

The default deep_recall_penalty is 0.5, meaning superseded memories need to be 2x as relevant to rank alongside active memories.

from cognitive_memory import CognitiveMemoryConfig
config = CognitiveMemoryConfig(
retrieval_score_exponent=0.3, # alpha
deep_recall_penalty=0.5,
)
  • Lower alpha (0.1-0.2): Nearly ignores decay. Old memories rank as high as recent ones. Useful when your agent needs total recall regardless of age.
  • Default alpha (0.3): Balanced. Slightly favors recent memories but doesn’t suppress old relevant ones.
  • Higher alpha (0.5-1.0): Strongly favors recent memories. Old memories are effectively buried unless they’re extremely relevant. Useful for fast-paced domains where old information is usually stale.
def score_memory(self, memory, relevance, now):
retention = self.compute_retention(memory, now)
alpha = self.config.retrieval_score_exponent
return relevance * (retention ** alpha)

Note: The TypeScript SDK currently uses alpha = 1.0 (raw retention). The Python SDK uses the configurable alpha exponent.