Core Memory Architecture

ICE implements a 3-tier storage architecture that decouples the LLM's physical context window from the application's session history. The active context window is assembled on each request from whichever tier holds the relevant data.

1. Storage Tiers

Tier	Component	Purpose
Hot-Cache	Redis	Active session prefix and recent tool-call state.
Semantic Ledger	PostgreSQL + pgvector	Long-term vectorized interaction history and ingested document fragments.
Cold Archive	Object storage (via webhook)	Archived sessions and audit logs after the retention window expires.

2. Retrieval Mechanics

Tiered Prompt Injection

ICE assembles the final prompt using a multi-rank retrieval system. High-relevance fragments are placed at the prompt prefix and suffix — positions where attention is strongest — while lower-ranked content is compressed or summarized.

KV-Cache Alignment

The engine maintains stable prompt prefixes to maximize KV-Cache hit rates in inference engines (vLLM, TGI). This significantly reduces Time-To-First-Token (TTFT) for repeated session interactions.

Recursive Completion

For outputs that exceed the model's physical output limit, ICE executes recursive re-prompting. It captures the partial output, re-injects the current state, and triggers a continuation until the ICE_MAX_CONTINUATIONS limit or task completion.

3. Data Integrity

Context Pinning: Explicitly pins critical session metadata or recent tool results to the active window.
Semantic Partitioning: Automatic chunking and overlap management for high-fidelity retrieval.
Prefetching: Asynchronous pre-retrieval of likely context based on session trajectory.

1. Storage Tiers​

2. Retrieval Mechanics​

Tiered Prompt Injection​

KV-Cache Alignment​

Recursive Completion​

3. Data Integrity​