Performance & Scalability
ICE uses a stateless kernel design with a shared-nothing data layer (PostgreSQL + Redis), enabling horizontal compute scaling without session stickiness.
1. Scalability Architecture
| Dimension | Design |
|---|---|
| Compute Scaling | Stateless nodes — add/remove without downtime or session migration |
| Data Layer | Centralized PostgreSQL (Semantic Ledger) + Redis (Hot-Cache) cluster shared across nodes |
| Session Affinity | None required — any node can serve any session |
| Streaming | SSE passthrough — ICE does not buffer LLM output tokens |
| Retrieval Path | Async pgvector HNSW query, executed before prompt assembly |
2. Resource Governance
ICE enforces hard resource caps via environment variables. These prevent runaway processes under load.
ICE_MEMORY_CAP_GB: Hard RAM ceiling for the ICE process. Engine terminates cleanly if exceeded.ICE_MAX_STITCH_CONCURRENCY: Maximum parallel context assembly operations. Prevents CPU saturation during high-concurrency bursts.ICE_POST_COMPRESSION_LIMIT: Maximum final prompt size (tokens) submitted to the upstream LLM. Enforced unconditionally.
3. High Availability
Fallback Mode
If PostgreSQL or Redis become unreachable, ICE bypasses context injection and routes the raw prompt directly to the upstream LLM. API availability is maintained. Context augmentation is suspended until backing services recover.
Horizontal Scaling
Add compute nodes and point them at the shared PostgreSQL + Redis cluster. No coordination required between nodes. Load balancing is handled at the network layer (e.g., K8s Service or a reverse proxy).