Retrieval Scoring
The scoring formula
Section titled “The scoring formula”When you search for memories, each candidate is scored by combining semantic similarity with temporal retention:
score(m, q) = sim(m, q) * R(m)^alphaWhere:
- sim(m, q) = cosine similarity between the memory’s embedding and the query embedding
- R(m) = current retention of the memory (from the decay model)
- alpha = retention exponent (default: 0.3)
Why R^alpha instead of just R?
Section titled “Why R^alpha instead of just R?”Using raw retention (alpha = 1.0) would aggressively suppress faded memories. A memory at 20% retention would have its relevance score cut by 80%. This is too harsh — a highly relevant but slightly faded memory should still surface.
The exponent alpha = 0.3 softens the decay penalty:
| Retention R | R^1.0 (raw) | R^0.3 (softened) |
|---|---|---|
| 1.00 | 1.00 | 1.00 |
| 0.80 | 0.80 | 0.93 |
| 0.60 | 0.60 | 0.86 |
| 0.40 | 0.40 | 0.76 |
| 0.20 | 0.20 | 0.62 |
| 0.10 | 0.10 | 0.50 |
| 0.02 | 0.02 | 0.31 |
With alpha = 0.3, a memory at 20% retention still retains 62% of its similarity score. This allows old but highly relevant memories to outrank recent but marginally relevant ones.
Practical effect
Section titled “Practical effect”Consider two memories being scored against a query:
Memory A — Recent, moderate relevance:
- Cosine similarity: 0.6
- Retention: 0.95
- Score = 0.6 * 0.95^0.3 = 0.6 * 0.985 = 0.591
Memory B — Old, high relevance:
- Cosine similarity: 0.9
- Retention: 0.25
- Score = 0.9 * 0.25^0.3 = 0.9 * 0.660 = 0.594
Memory B wins despite being much older because its semantic relevance is strong enough to overcome the decay penalty. With raw retention (alpha=1.0), Memory B would score only 0.225 — a massive loss of signal.
Hybrid retrieval
Section titled “Hybrid retrieval”In v6, retrieval can combine dense (cosine similarity) search with BM25 lexical search for better coverage. Dense search excels at semantic similarity but can miss exact keyword matches; BM25 catches those.
How it works
Section titled “How it works”When hybrid search is enabled, both a dense vector search and a BM25 lexical search are performed. The candidate sets are unioned — any memory found by either method enters the scoring pipeline. Scores from both sources are combined using reciprocal rank fusion.
Configuration
Section titled “Configuration”| Python | TypeScript | Description |
|---|---|---|
hybrid_search | hybridSearch | Enable hybrid dense + BM25 retrieval (default: False) |
k_sparse | kSparse | Number of candidates from BM25 search (default: same as top_k) |
config = CognitiveMemoryConfig( hybrid_search=True, k_sparse=20,)Adapter requirement
Section titled “Adapter requirement”BM25 search only activates when the storage adapter implements the search_lexical (Python) / searchLexical (TypeScript) method. If the adapter does not implement it, hybrid search silently falls back to dense-only retrieval.
LLM reranking (optional)
Section titled “LLM reranking (optional)”After hybrid retrieval scoring, an optional LLM reranking step can further refine results. When rerank_enabled is True and an LLM is available, the top k_rerank candidates (default 10) are sent to the LLM, which re-scores them for relevance against the original query. This is especially useful for ambiguous queries where vector similarity alone may not capture intent. See the Scoring Pipeline for details on where this fits in the pipeline.
Deep recall penalty
Section titled “Deep recall penalty”When deep recall is enabled, superseded memories (originals that were consolidated into summaries) can appear in results. These receive an additional penalty:
score_deep = score * deep_recall_penaltyThe default deep_recall_penalty is 0.5, meaning superseded memories need to be 2x as relevant to rank alongside active memories.
Configuration
Section titled “Configuration”from cognitive_memory import CognitiveMemoryConfig
config = CognitiveMemoryConfig( retrieval_score_exponent=0.3, # alpha deep_recall_penalty=0.5,)Tuning alpha
Section titled “Tuning alpha”- Lower alpha (0.1-0.2): Nearly ignores decay. Old memories rank as high as recent ones. Useful when your agent needs total recall regardless of age.
- Default alpha (0.3): Balanced. Slightly favors recent memories but doesn’t suppress old relevant ones.
- Higher alpha (0.5-1.0): Strongly favors recent memories. Old memories are effectively buried unless they’re extremely relevant. Useful for fast-paced domains where old information is usually stale.
Implementation
Section titled “Implementation”def score_memory(self, memory, relevance, now): retention = self.compute_retention(memory, now) alpha = self.config.retrieval_score_exponent return relevance * (retention ** alpha)const retention = this.calculateRetentionFor({ ... });const finalScore = relevanceScore * retention;Note: The TypeScript SDK currently uses alpha = 1.0 (raw retention). The Python SDK uses the configurable alpha exponent.