Configuration Guide

Every parameter has a sensible default. You don’t need to change anything to get started. This guide helps you tune settings when you have a specific use case or want to optimize performance.

Quick-start recipes

Pick the recipe closest to your use case, then fine-tune individual parameters as needed.

Personal assistant / chatbot

Long-running conversations across many sessions. Needs to remember user preferences, past events, and facts.

config = CognitiveMemoryConfig(
    extraction_mode="semantic",
    retrieval_score_exponent=0.3,
    direct_boost=0.1,
    core_access_threshold=10,
    core_session_threshold=3,
)
# Search with: top_k=20, deep_recall=False

This is the default configuration. Semantic extraction captures structured facts across sessions. Core promotion ensures frequently-referenced information (name, preferences) becomes permanent.

Dialog system / roleplay / trivia

Exact recall of what was said matters more than structured reasoning. Users may ask “what did X say about Y?”

config = CognitiveMemoryConfig(
    extraction_mode="raw",
    retrieval_score_exponent=0.1,  # don't penalize old turns
    run_maintenance_during_ingestion=False,  # no consolidation
)
# Search with: top_k=50, deep_recall=False

Raw mode stores every turn verbatim. Low alpha keeps old turns retrievable. Disable maintenance to prevent consolidation from merging turns.

Hybrid recall (best of both)

Need structured facts AND exact quotes. Willing to pay the storage and LLM cost.

config = CognitiveMemoryConfig(
    extraction_mode="hybrid",
    retrieval_score_exponent=0.2,
    direct_boost=0.1,
)
# Search with: top_k=30, deep_recall=True

Hybrid stores both extracted facts and raw turns. Deep recall ensures superseded originals can still surface when needed. Good for applications where users ask both “what is the user’s job?” and “what exactly did they say on March 12?”

Medical / legal records

Long-term, high-fidelity recall. Nothing should be forgotten.

config = CognitiveMemoryConfig(
    extraction_mode="semantic",
    retrieval_score_exponent=0.1,  # nearly ignore decay
    core_access_threshold=5,       # promote to core faster
    core_stability_threshold=0.7,
    cold_storage_ttl_days=365,     # keep cold memories longer
    consolidation_retention_threshold=0.10,  # only consolidate very faded
    custom_extraction_instructions="Focus on medical conditions, medications, allergies, and treatment history.",
)
# Search with: top_k=60, deep_recall=True

Low alpha means old memories rank nearly as high as recent ones. Aggressive core promotion protects important information. Long cold storage TTL and conservative consolidation prevent data loss.

Real-time / news / rapidly changing data

Information changes frequently. Old data should fade in favor of updates.

config = CognitiveMemoryConfig(
    extraction_mode="semantic",
    retrieval_score_exponent=0.7,  # strong recency bias
    direct_boost=0.05,            # less stability reinforcement
    cold_migration_days=3,        # move to cold faster
    cold_storage_ttl_days=30,     # purge cold data quickly
    consolidation_retention_threshold=0.30,  # consolidate earlier
)
# Search with: top_k=10, deep_recall=False

High alpha suppresses old memories. Fast cold migration and short TTL keep the active set small and current.

Batch import / benchmark

Migrating historical data. Speed matters, not real-time behavior.

config = CognitiveMemoryConfig(
    run_maintenance_during_ingestion=False,
)
mem = CognitiveMemory(config=config)

for conv in conversations:
    await mem.extract_and_store(conv, session_id=..., run_tick=False)

# Run maintenance once at the end
await mem.tick()

Disabling maintenance during ingestion avoids O(n) cold-migration checks after every batch.

Parameter-by-parameter guide

Extraction mode

extraction_mode controls how conversations become memories. See Ingestion Pipeline for detailed comparison.

Mode	LLM cost	Storage	Best for
`"semantic"`	Yes	Low	Most applications
`"raw"`	No	Medium	Exact recall, low latency
`"hybrid"`	Yes	High	Both structured and verbatim recall

Extraction model

extraction_model sets the LLM used for fact extraction (semantic and hybrid modes only).

Model	Quality	Cost	Speed
`gpt-4o-mini`	Good	Low	Fast
`gpt-4o`	Better	10x	Slower
`gpt-4.1-mini`	Good	Low	Fast

Upgrade to gpt-4o if extraction quality is the bottleneck (e.g., missing important facts, poor categorization). For most applications, gpt-4o-mini is sufficient.

Custom extraction instructions

custom_extraction_instructions prepends domain-specific guidance to the extraction prompt.

# Medical assistant
config = CognitiveMemoryConfig(
    custom_extraction_instructions="Focus on medical conditions, medications, dosages, allergies, and treatment outcomes. Classify all medical facts as core."
)

# E-commerce
config = CognitiveMemoryConfig(
    custom_extraction_instructions="Track product preferences, sizes, brands, past purchases, and return history."
)

Use this when the default extraction misses domain-specific information or assigns wrong importance/categories.

Retrieval score exponent (alpha)

retrieval_score_exponent controls how much memory decay affects search ranking. The scoring formula is score = similarity * retention^alpha.

Value	Behavior	Use case
0.1	Nearly ignores decay. Old and new memories rank equally.	Medical records, legal, archival
0.3	Default. Balanced recency bias.	Personal assistants, chatbots
0.5	Moderate recency bias. Recent memories preferred.	News, evolving topics
0.7-1.0	Strong recency bias. Old memories effectively suppressed.	Real-time data, trending topics

This is the single most impactful parameter for retrieval quality. Tune this first.

top_k (search parameter)

Not a config parameter — passed to search(). But it’s the most impactful retrieval knob.

Value	Effect	Use case
5-10	Precise, may miss relevant memories	Simple Q&A, low latency
20-40	Balanced recall and precision	Most applications
50-60	Broad recall, catches weak matches	With re-ranking, complex queries

Our benchmarks showed +14pp accuracy improvement going from k=10 to k=60. If you’re missing relevant memories, increase k first.

Deep recall (search parameter)

results = await mem.search(query, deep_recall=True)

When enabled, superseded memories (originals that were consolidated into summaries) are included in search results with a penalty (deep_recall_penalty, default 0.5x score).

Enable when: Users need specific dates, names, or numbers that may have been lost during consolidation.

Disable when: You want clean, deduplicated results and don’t need historical granularity.

Retrieval boosting

Controls how much retrieving a memory strengthens it (spaced repetition).

Parameter	Default	Effect of increasing
`direct_boost`	0.1	Memories stabilize faster with fewer retrievals
`associative_boost`	0.03	Linked memories strengthen more from co-retrieval
`max_spaced_rep_multiplier`	2.0	Larger bonus for accessing long-idle memories
`spaced_rep_interval_days`	7.0	Full bonus kicks in sooner after last access

If memories fade too fast: Increase direct_boost to 0.15-0.20.

If memories never fade: Decrease direct_boost to 0.05 and lower max_spaced_rep_multiplier to 1.5.

Core promotion

Core memories have a high retention floor (0.60) and never get cold-migrated. Three conditions must ALL be met:

Parameter	Default	Effect of lowering
`core_access_threshold`	10	Fewer retrievals needed for promotion
`core_stability_threshold`	0.85	Lower stability bar for promotion
`core_session_threshold`	3	Fewer distinct sessions needed

Too few core promotions? Lower all three (e.g., 5, 0.7, 2).

Too many core promotions? Raise core_access_threshold to 15-20 and core_session_threshold to 5+. Over-promoting makes the system “sticky” — old information never fades.

Decay parameters

Parameter	Default	Description
`faint_threshold`	0.15	Retention below this = “faint” memory

Base decay rates are category-level constants:

Category	Decay rate (days)	Floor	Behavior
Episodic	45	0.02	Fades relatively fast
Semantic	120	0.02	Fades slowly
Procedural	Infinity	0.02	Never decays
Core	120	0.60	Fades slowly, high floor

You can’t change base decay rates via config — they’re set by memory category. To make memories last longer, increase direct_boost or use custom_extraction_instructions to bias toward higher importance scores (stability = 0.1 + importance * 0.3).

Associations

Controls how memories link to each other and activate during retrieval.

Parameter	Default	Effect of changing
`association_strengthen_amount`	0.1	Higher = associations build faster per co-retrieval
`association_retrieval_threshold`	0.3	Lower = more associations activate (more multi-hop, more noise)
`association_decay_constant_days`	90	Higher = associations persist longer without reinforcement

For multi-hop reasoning: Lower association_retrieval_threshold to 0.2 and increase association_decay_constant_days to 120-180.

For precision (less noise): Raise association_retrieval_threshold to 0.4.

Consolidation

Controls when and how fading memories get merged into summaries.

Parameter	Default	Effect of changing
`consolidation_retention_threshold`	0.20	Higher = consolidate earlier (less storage, but lose detail sooner)
`consolidation_group_size`	5	Lower = more frequent consolidation events
`consolidation_similarity_threshold`	0.70	Lower = larger, more diverse groups

For archival applications: Lower threshold to 0.10 (only consolidate very faded memories).

For high-throughput applications: Raise threshold to 0.30 and lower group size to 3.

Tiered storage

Controls when memories leave hot storage and when cold memories expire.

Parameter	Default	Effect of changing
`cold_migration_days`	7	Lower = memories leave hot store faster (less search noise, but may lose multi-hop paths)
`cold_storage_ttl_days`	180	Higher = cold memories survive longer before becoming stubs

For long-term applications: Set cold_storage_ttl_days to 365+.

For ephemeral applications: Set cold_migration_days to 3 and cold_storage_ttl_days to 30.

Embedding

Parameter	Default	Description
`embedding_model`	`"text-embedding-3-small"`	OpenAI embedding model
`embedding_dimensions`	1536	Vector dimensions

Higher dimensions (3072 with text-embedding-3-large) give better similarity discrimination at the cost of storage and search speed. For most applications, 1536 dimensions with text-embedding-3-small is the right trade-off.

Measuring quality

The best way to measure tuning impact:

Build a test set of 50-100 question-answer pairs from your domain
Ingest the relevant conversations
Run searches for each question
Score answers against ground truth
Track overall accuracy and multi-hop accuracy separately

Avoid tuning on individual examples — optimize for aggregate metrics across your test set.

Key interactions

retrieval_score_exponent and direct_boost interact: higher alpha makes boosting more impactful
consolidation_retention_threshold and cold_migration_days together control when memories leave hot storage
core_access_threshold and core_stability_threshold create a compound requirement: lowering one doesn’t help if the other is too high
extraction_mode="hybrid" with deep_recall=True gives maximum recall at the cost of more noise — pair with higher top_k and re-ranking