Ingestion Pipeline
Overview
Section titled “Overview”The ingestion pipeline converts raw conversation text into structured, embedded, and linked memories. It has five stages:
- LLM extraction (narrator prompt)
- Embedding
- Conflict detection
- Ingestion-time similarity boost
- Synaptic tagging (association creation)
Stage 1: LLM extraction
Section titled “Stage 1: LLM extraction”The system sends the conversation text to an LLM with a narrator-style extraction prompt. The prompt instructs the LLM to:
- Extract every distinct fact, event, and piece of information
- Narrate what happened rather than interpret (e.g., “Alex went hiking at Mount Rainier” not “Alex enjoys outdoor activities”)
- Classify each memory as
core,semantic,episodic, orprocedural - Assign an importance score (0.0 to 1.0)
- Resolve relative dates using the conversation timestamp
- Prioritize user messages over assistant messages
- Never skip passing mentions
The LLM returns a JSON array:
[ {"content": "Alex is a 32-year-old software engineer", "category": "core", "importance": 0.9}, {"content": "Alex is single", "category": "core", "importance": 0.7}, {"content": "Alex finished reading The Great Gatsby in January 2024", "category": "episodic", "importance": 0.5}]Narrator vs. summarizer
Section titled “Narrator vs. summarizer”The key design choice is narrator mode. Most memory systems summarize: “User likes outdoor activities.” This is lossy and vague. The narrator approach preserves specifics: “User went hiking at Mount Rainier on March 12, 2024.”
Specific facts are harder to extract but much more useful for retrieval. “What trail did the user hike?” can only be answered from specific episodic memories, not from generalized summaries.
Custom extraction instructions
Section titled “Custom extraction instructions”You can prepend custom instructions to the extraction prompt:
config = CognitiveMemoryConfig( custom_extraction_instructions="Focus on the user's dietary restrictions and food preferences.")Stage 2: Embedding
Section titled “Stage 2: Embedding”After extraction, all memory contents are batch-embedded:
contents = [m.content for m in memories]embeddings = self._embedder.embed_batch(contents)for mem, emb in zip(memories, embeddings): mem.embedding = embBatch embedding is more efficient than individual calls. The default embedder uses OpenAI’s text-embedding-3-small with 1536 dimensions.
Stage 3: Conflict detection
Section titled “Stage 3: Conflict detection”Each new memory is checked against existing high-importance and core memories for contradictions or updates. See Conflict detection for the full mechanism.
This step involves LLM calls, so it only checks pairs with cosine similarity >= 0.6.
Stage 4: Ingestion-time similarity boost
Section titled “Stage 4: Ingestion-time similarity boost”If a new memory is semantically similar to an existing memory (similarity > 0.75), the existing memory’s stability gets a small boost (+0.05):
similar = await self._adapter.search_similar(mem.embedding, top_k=3)for existing_mem, sim in similar: if sim > 0.75 and existing_mem.id != mem.id: existing_mem.stability = min(1.0, existing_mem.stability + 0.05)This models repeated exposure: when a user mentions the same topic across conversations, the existing memory about that topic becomes more stable — even though a new memory is also being created.
Stage 5: Synaptic tagging
Section titled “Stage 5: Synaptic tagging”The final stage creates association links between memories extracted from the same conversation. For each pair of newly-stored memories:
- Compute cosine similarity between their embeddings
- If similarity >= 0.4, create a bidirectional association
- Link weight =
min(0.5, 0.2 + (sim - 0.4) * 0.5)
See Associations for details on how these links are used during retrieval.
Full usage
Section titled “Full usage”# High-level: extract + store everything from a conversationmemories = await mem.extract_and_store( conversation_text, session_id="session-42", timestamp=datetime(2024, 3, 15), run_tick=True, # run maintenance every 5 ingestions)
# Low-level: add a single memory directly (skips LLM extraction)memory = await mem.add( "User is allergic to peanuts", category="core", importance=0.95, session_id="session-42",)Extraction modes
Section titled “Extraction modes”The extraction_mode parameter controls how conversations become memories. Choose based on your use case.
"semantic" (default)
Section titled “"semantic" (default)”LLM extracts structured facts from conversation text. Best for most applications.
config = CognitiveMemoryConfig(extraction_mode="semantic")Strengths: Deduplicates redundant information, classifies by category, assigns importance, resolves relative dates, captures multi-hop reasoning chains.
Weaknesses: May lose exact wording, timestamps can be paraphrased, LLM cost per ingestion.
Best for: Personal assistants, customer support bots, medical/legal record systems — anywhere you need structured recall over long periods.
Each conversation turn is stored verbatim as an episodic memory. No LLM is called during ingestion.
config = CognitiveMemoryConfig(extraction_mode="raw")Strengths: Zero extraction cost, preserves exact quotes and timestamps, no information loss, fast ingestion.
Weaknesses: No deduplication, no importance scoring, no category classification, higher storage volume, retrieval relies entirely on embedding similarity.
Best for: Dialog systems where exact recall matters (trivia, roleplay), low-latency applications, when you want to handle extraction yourself.
"hybrid"
Section titled “"hybrid"”Runs both: LLM extracts structured facts AND raw turns are stored. Retrieval searches across both.
config = CognitiveMemoryConfig(extraction_mode="hybrid")Strengths: Combines structured reasoning with verbatim recall. Questions about facts hit extracted memories; questions about exact wording hit raw turns.
Weaknesses: ~2x storage, LLM cost for extraction, more memories to search through.
Best for: Applications that need both structured recall and exact-quote retrieval — e.g., a personal assistant that should know “the user is allergic to peanuts” (semantic) AND be able to recall “what did the user say on March 12?” (raw).
Choosing between modes
Section titled “Choosing between modes”| Question | Answer | Recommended mode |
|---|---|---|
| Do I need exact quotes from past conversations? | Yes | raw or hybrid |
| Is LLM cost during ingestion acceptable? | No | raw |
| Do I need multi-hop reasoning across sessions? | Yes | semantic or hybrid |
| Are my questions about specific dates and times? | Yes | hybrid |
| Is storage cost a concern? | Yes | semantic |
| Do I need both structured facts and verbatim recall? | Yes | hybrid |
Configuration
Section titled “Configuration”config = CognitiveMemoryConfig( extraction_mode="semantic", # "raw" | "semantic" | "hybrid" extraction_model="gpt-4o-mini", # LLM for extraction embedding_model="text-embedding-3-small", # embedding model embedding_dimensions=1536, # embedding vector size run_maintenance_during_ingestion=True, # set False for batch imports custom_extraction_instructions=None, # prepend to extraction prompt)Batch ingestion
Section titled “Batch ingestion”For batch imports (benchmarks, migration), disable maintenance during ingestion:
config = CognitiveMemoryConfig(run_maintenance_during_ingestion=False)mem = CognitiveMemory(config=config)
for conv in conversations: await mem.extract_and_store(conv, session_id=..., run_tick=False)
# Run maintenance once at the endawait mem.tick()