Skip to main content

Storage Architecture

ICE uses three distinct storage interaction patterns. Each serves a different function and requires separate configuration.


1. Active Semantic Ledger (Hot Storage)

The active memory store. All session history, context fragments, and vector embeddings live here during their active lifespan.

Technology: PostgreSQL with the pgvector extension.

Why not object storage (S3)? The retrieval path requires sub-second vector similarity queries with per-tenant RLS enforcement. Object storage introduces HTTP and serialization overhead incompatible with this access pattern. PostgreSQL on NVMe is the correct substrate.

Configuration:

VariableDescriptionRequired
DATABASE_URLPostgreSQL connection string. Must have pgvector installed.Yes
REDIS_URLRedis connection string for the Hot-Cache (session sliding window).Yes
ICE_MEMORY_CAP_GBHard RAM ceiling for the ICE process. Engine terminates cleanly if exceeded.No (8)
DATABASE_URL="postgresql://user:pass@db-host:5432/ice_db"
REDIS_URL="redis://cache-host:6379"
ICE_MEMORY_CAP_GB=16

2. Cold Storage Delegation

For data retention and compliance archival. ICE does not connect directly to S3. Instead, it delegates expired records to a webhook endpoint you control.

Delegation pipeline:

  1. Data lives in the active ledger for ICE_RETENTION_DAYS.
  2. Every 24 hours, the ICE retention_purge_loop identifies expired records.
  3. ICE POSTs each expired record as a JSON payload to ICE_PRE_PURGE_WEBHOOK_URL.
  4. Your webhook receives the payload and writes it to your object storage of choice (AWS S3, GCS, Azure Blob, Glacier, etc.).
  5. ICE permanently deletes the local record only after receiving a 2xx response from the webhook.

This design keeps cold storage configuration inside your infrastructure boundary. ICE has no direct knowledge of your bucket provider, credentials, or retention policy.

Configuration:

VariableDescriptionRequired
ICE_RETENTION_DAYSDays a record stays in the active ledger before delegation.No (30)
ICE_PRE_PURGE_WEBHOOK_URLEndpoint that receives expired records for archival.No
ICE_RETENTION_DAYS=90
ICE_PRE_PURGE_WEBHOOK_URL="https://internal.yourcompany.com/webhooks/ice-archive"

Webhook payload (JSON):

{
"user_id": "user-alice",
"session_id": "project-alpha",
"expired_at": "2026-05-10T00:00:00Z",
"chunks": [
{ "chunk_id": "c_001", "text": "...", "embedding_model": "text-embedding-3-small" }
]
}

ICE retries the webhook up to 3 times with exponential backoff before logging a permanent failure. Records are not deleted locally if all retries fail.


3. Document Ingestion (Reading from Storage)

A separate pipeline for feeding external documents into the Semantic Ledger. This is not related to how ICE stores its internal session memory.

Local File System

Files must be in the sandboxed ICE_UPLOAD_DIR. See Multimodal Ingest for format details.

ice.ingest(
file_path="annual_report_2026.pdf", # Relative to ICE_UPLOAD_DIR
x_user_id="user-alice",
x_session_id="finance-analysis"
)
VariableDescriptionDefault
ICE_UPLOAD_DIRSandboxed directory for local file ingestion./tmp/ice/uploads

Object Storage (Cloud URI)

For ingesting documents directly from cloud object storage, pass a cloud URI. ICE handles the download, parsing, chunking, and vectorization internally.

AWS S3

ice.ingest(
uri="s3://my-enterprise-bucket/project_alpha_docs/",
x_user_id="user-alice",
x_session_id="project-alpha"
)

Credentials are resolved from the standard AWS credential chain (environment variables, instance profile, or ECS task role).

Google Cloud Storage

ice.ingest(
uri="gs://my-gcp-bucket/project_alpha_docs/",
x_user_id="user-alice",
x_session_id="project-alpha"
)

Credentials are resolved from the standard GCP credential chain (Application Default Credentials or a service account key).

No ICE-specific variable is required for either provider. Credentials are handled by the host environment.


Storage Pattern Summary

PatternTechnologyICE Touches Object Storage?Configuration
Active Semantic LedgerPostgreSQL + RedisNoDATABASE_URL, REDIS_URL
Cold Storage DelegationYour webhook → Your bucketNo (delegates)ICE_RETENTION_DAYS, ICE_PRE_PURGE_WEBHOOK_URL
Document Ingestion (local)Local filesystemNoICE_UPLOAD_DIR
Document Ingestion (S3)AWS S3 (read-only)Yes (read)uri="s3://..." in SDK call
Document Ingestion (GCS)Google Cloud Storage (read-only)Yes (read)uri="gs://..." in SDK call