Skip to content

Configuration Guide

Every parameter has a sensible default. You don’t need to change anything to get started. This guide helps you tune settings when you have a specific use case or want to optimize performance.

Pick the recipe closest to your use case, then fine-tune individual parameters as needed.

Long-running conversations across many sessions. Needs to remember user preferences, past events, and facts.

config = CognitiveMemoryConfig(
extraction_mode="semantic",
retrieval_score_exponent=0.3,
direct_boost=0.1,
core_access_threshold=10,
core_session_threshold=3,
)
# Search with: top_k=20, deep_recall=False

This is the default configuration. Semantic extraction captures structured facts across sessions. Core promotion ensures frequently-referenced information (name, preferences) becomes permanent.

Exact recall of what was said matters more than structured reasoning. Users may ask “what did X say about Y?”

config = CognitiveMemoryConfig(
extraction_mode="raw",
retrieval_score_exponent=0.1, # don't penalize old turns
run_maintenance_during_ingestion=False, # no consolidation
)
# Search with: top_k=50, deep_recall=False

Raw mode stores every turn verbatim. Low alpha keeps old turns retrievable. Disable maintenance to prevent consolidation from merging turns.

Need structured facts AND exact quotes. Willing to pay the storage and LLM cost.

config = CognitiveMemoryConfig(
extraction_mode="hybrid",
retrieval_score_exponent=0.2,
direct_boost=0.1,
)
# Search with: top_k=30, deep_recall=True

Hybrid stores both extracted facts and raw turns. Deep recall ensures superseded originals can still surface when needed. Good for applications where users ask both “what is the user’s job?” and “what exactly did they say on March 12?”

Long-term, high-fidelity recall. Nothing should be forgotten.

config = CognitiveMemoryConfig(
extraction_mode="semantic",
retrieval_score_exponent=0.1, # nearly ignore decay
core_access_threshold=5, # promote to core faster
core_stability_threshold=0.7,
cold_storage_ttl_days=365, # keep cold memories longer
consolidation_retention_threshold=0.10, # only consolidate very faded
custom_extraction_instructions="Focus on medical conditions, medications, allergies, and treatment history.",
)
# Search with: top_k=60, deep_recall=True

Low alpha means old memories rank nearly as high as recent ones. Aggressive core promotion protects important information. Long cold storage TTL and conservative consolidation prevent data loss.

Information changes frequently. Old data should fade in favor of updates.

config = CognitiveMemoryConfig(
extraction_mode="semantic",
retrieval_score_exponent=0.7, # strong recency bias
direct_boost=0.05, # less stability reinforcement
cold_migration_days=3, # move to cold faster
cold_storage_ttl_days=30, # purge cold data quickly
consolidation_retention_threshold=0.30, # consolidate earlier
)
# Search with: top_k=10, deep_recall=False

High alpha suppresses old memories. Fast cold migration and short TTL keep the active set small and current.

Migrating historical data. Speed matters, not real-time behavior.

config = CognitiveMemoryConfig(
run_maintenance_during_ingestion=False,
)
mem = CognitiveMemory(config=config)
for conv in conversations:
await mem.extract_and_store(conv, session_id=..., run_tick=False)
# Run maintenance once at the end
await mem.tick()

Disabling maintenance during ingestion avoids O(n) cold-migration checks after every batch.


extraction_mode controls how conversations become memories. See Ingestion Pipeline for detailed comparison.

ModeLLM costStorageBest for
"semantic"YesLowMost applications
"raw"NoMediumExact recall, low latency
"hybrid"YesHighBoth structured and verbatim recall

extraction_model sets the LLM used for fact extraction (semantic and hybrid modes only).

ModelQualityCostSpeed
gpt-4o-miniGoodLowFast
gpt-4oBetter10xSlower
gpt-4.1-miniGoodLowFast

Upgrade to gpt-4o if extraction quality is the bottleneck (e.g., missing important facts, poor categorization). For most applications, gpt-4o-mini is sufficient.

custom_extraction_instructions prepends domain-specific guidance to the extraction prompt.

# Medical assistant
config = CognitiveMemoryConfig(
custom_extraction_instructions="Focus on medical conditions, medications, dosages, allergies, and treatment outcomes. Classify all medical facts as core."
)
# E-commerce
config = CognitiveMemoryConfig(
custom_extraction_instructions="Track product preferences, sizes, brands, past purchases, and return history."
)

Use this when the default extraction misses domain-specific information or assigns wrong importance/categories.

retrieval_score_exponent controls how much memory decay affects search ranking. The scoring formula is score = similarity * retention^alpha.

ValueBehaviorUse case
0.1Nearly ignores decay. Old and new memories rank equally.Medical records, legal, archival
0.3Default. Balanced recency bias.Personal assistants, chatbots
0.5Moderate recency bias. Recent memories preferred.News, evolving topics
0.7-1.0Strong recency bias. Old memories effectively suppressed.Real-time data, trending topics

This is the single most impactful parameter for retrieval quality. Tune this first.

Not a config parameter — passed to search(). But it’s the most impactful retrieval knob.

ValueEffectUse case
5-10Precise, may miss relevant memoriesSimple Q&A, low latency
20-40Balanced recall and precisionMost applications
50-60Broad recall, catches weak matchesWith re-ranking, complex queries

Our benchmarks showed +14pp accuracy improvement going from k=10 to k=60. If you’re missing relevant memories, increase k first.

results = await mem.search(query, deep_recall=True)

When enabled, superseded memories (originals that were consolidated into summaries) are included in search results with a penalty (deep_recall_penalty, default 0.5x score).

Enable when: Users need specific dates, names, or numbers that may have been lost during consolidation.

Disable when: You want clean, deduplicated results and don’t need historical granularity.

Controls how much retrieving a memory strengthens it (spaced repetition).

ParameterDefaultEffect of increasing
direct_boost0.1Memories stabilize faster with fewer retrievals
associative_boost0.03Linked memories strengthen more from co-retrieval
max_spaced_rep_multiplier2.0Larger bonus for accessing long-idle memories
spaced_rep_interval_days7.0Full bonus kicks in sooner after last access

If memories fade too fast: Increase direct_boost to 0.15-0.20.

If memories never fade: Decrease direct_boost to 0.05 and lower max_spaced_rep_multiplier to 1.5.

Core memories have a high retention floor (0.60) and never get cold-migrated. Three conditions must ALL be met:

ParameterDefaultEffect of lowering
core_access_threshold10Fewer retrievals needed for promotion
core_stability_threshold0.85Lower stability bar for promotion
core_session_threshold3Fewer distinct sessions needed

Too few core promotions? Lower all three (e.g., 5, 0.7, 2).

Too many core promotions? Raise core_access_threshold to 15-20 and core_session_threshold to 5+. Over-promoting makes the system “sticky” — old information never fades.

ParameterDefaultDescription
faint_threshold0.15Retention below this = “faint” memory

Base decay rates are category-level constants:

CategoryDecay rate (days)FloorBehavior
Episodic450.02Fades relatively fast
Semantic1200.02Fades slowly
ProceduralInfinity0.02Never decays
Core1200.60Fades slowly, high floor

You can’t change base decay rates via config — they’re set by memory category. To make memories last longer, increase direct_boost or use custom_extraction_instructions to bias toward higher importance scores (stability = 0.1 + importance * 0.3).

Controls how memories link to each other and activate during retrieval.

ParameterDefaultEffect of changing
association_strengthen_amount0.1Higher = associations build faster per co-retrieval
association_retrieval_threshold0.3Lower = more associations activate (more multi-hop, more noise)
association_decay_constant_days90Higher = associations persist longer without reinforcement

For multi-hop reasoning: Lower association_retrieval_threshold to 0.2 and increase association_decay_constant_days to 120-180.

For precision (less noise): Raise association_retrieval_threshold to 0.4.

Controls when and how fading memories get merged into summaries.

ParameterDefaultEffect of changing
consolidation_retention_threshold0.20Higher = consolidate earlier (less storage, but lose detail sooner)
consolidation_group_size5Lower = more frequent consolidation events
consolidation_similarity_threshold0.70Lower = larger, more diverse groups

For archival applications: Lower threshold to 0.10 (only consolidate very faded memories).

For high-throughput applications: Raise threshold to 0.30 and lower group size to 3.

Controls when memories leave hot storage and when cold memories expire.

ParameterDefaultEffect of changing
cold_migration_days7Lower = memories leave hot store faster (less search noise, but may lose multi-hop paths)
cold_storage_ttl_days180Higher = cold memories survive longer before becoming stubs

For long-term applications: Set cold_storage_ttl_days to 365+.

For ephemeral applications: Set cold_migration_days to 3 and cold_storage_ttl_days to 30.

ParameterDefaultDescription
embedding_model"text-embedding-3-small"OpenAI embedding model
embedding_dimensions1536Vector dimensions

Higher dimensions (3072 with text-embedding-3-large) give better similarity discrimination at the cost of storage and search speed. For most applications, 1536 dimensions with text-embedding-3-small is the right trade-off.


The best way to measure tuning impact:

  1. Build a test set of 50-100 question-answer pairs from your domain
  2. Ingest the relevant conversations
  3. Run searches for each question
  4. Score answers against ground truth
  5. Track overall accuracy and multi-hop accuracy separately

Avoid tuning on individual examples — optimize for aggregate metrics across your test set.

  • retrieval_score_exponent and direct_boost interact: higher alpha makes boosting more impactful
  • consolidation_retention_threshold and cold_migration_days together control when memories leave hot storage
  • core_access_threshold and core_stability_threshold create a compound requirement: lowering one doesn’t help if the other is too high
  • extraction_mode="hybrid" with deep_recall=True gives maximum recall at the cost of more noise — pair with higher top_k and re-ranking