Skip to main content

API Reference

ICE exposes an OpenAI-compatible REST API. If you already use an OpenAI, Anthropic, or Ollama SDK, point base_url at your ICE instance and add the two memory headers — nothing else changes.


Headers

These two headers are how ICE scopes memory. Every request to /v1/chat/completions and /v1/ingest should include them.

HeaderRequiredDescription
X-Session-IdYesIdentifies the conversation or workspace. All context is stored and retrieved under this ID.
X-User-IdNo (default: default-user)Identifies the user. Enforces per-user data isolation via Row-Level Security at the database layer.

Without X-Session-Id, ICE operates statelessly — no memory is stored or retrieved.


Endpoints

POST /v1/chat/completions

Drop-in replacement for the OpenAI chat completions endpoint. ICE retrieves relevant context from the session ledger and injects it into the prompt before forwarding to the upstream model.

Request

curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Session-Id: project-alpha" \
-H "X-User-Id: alice" \
-d '{
"model": "gpt-4o",
"messages": [{ "role": "user", "content": "Summarise what we discussed yesterday." }],
"stream": false
}'

Parameters

FieldTypeRequiredDescription
modelstringYesModel name to route to (gpt-4o, claude-3-5-sonnet, llama3, etc.)
messagesarrayYesStandard OpenAI messages array.
streambooleanNotrue for SSE streaming, false for a single response object.
toolsarrayNoStandard tool definitions. ICE persists tool call state across turns automatically.

Response (non-streaming)

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "gpt-4o",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "Yesterday we discussed..." },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 210, "completion_tokens": 48, "total_tokens": 258 }
}

Response (streaming)

Standard SSE chunks, terminated with data: [DONE]. Format is identical to the OpenAI streaming spec.


POST /v1/ingest

Loads a document into the session's memory ledger. After ingestion, content is available for retrieval in all subsequent chat completions under the same X-Session-Id.

Request

curl -X POST "http://localhost:8000/v1/ingest?file_path=annual_report.pdf" \
-H "X-Session-Id: finance-q3" \
-H "X-User-Id: alice"
ParameterTypeRequiredDescription
file_pathquery stringYesPath relative to ICE_UPLOAD_DIR.

For cloud storage ingestion, use the SDK ingest() method with a uri parameter (s3://... or gs://...). See Storage Architecture.

Response

{
"status": "success",
"message": "File annual_report.pdf ingested successfully.",
"tokens_processed": 128400
}

GET /health

Returns the engine status and connectivity of backing services.

curl http://localhost:8000/health
{
"status": "online",
"engine": "ICE v2.7.755",
"ledger_status": "connected",
"cache_status": "connected"
}

SDK Usage

from ice.sdk import ICEClient

ice = ICEClient(api_url="http://localhost:8000")

# Chat with memory
response = ice.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What did we cover last session?"}],
x_session_id="project-alpha",
x_user_id="alice"
)

# Ingest a local file
ice.ingest(
file_path="spec.pdf",
x_session_id="project-alpha",
x_user_id="alice"
)

# Ingest from cloud storage
ice.ingest(
uri="s3://my-bucket/docs/",
x_session_id="project-alpha",
x_user_id="alice"
)

Using Existing SDKs

Because ICE is OpenAI-compatible, you can use the official openai Python or JS library with no code changes beyond base_url and the two headers.

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-used" # ICE does not use an API key
)

response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarise the Q3 report."}],
extra_headers={
"X-Session-Id": "finance-q3",
"X-User-Id": "alice"
}
)