Chat API

The Chat API exposes two REST endpoints under the /chat router: one for synchronous replies and one for streaming (SSE).

POST /chat/run

Executes a single chat turn and returns the full response in one shot.

Authentication: JWT via Authorization: Bearer <token>.

Request body:

{
  "message": "string",
  "conversation_id": "string"
}

Behavior:

Accepts the user message and conversation_id.
Loads the conversation from Redis cache (or creates a new one).
Processes the message with the conversation/AI pipeline.
Updates the conversation cache with both user and AI messages.
Persists both messages to the database.
Returns the full AI response.

Response:

{
  "response": "string",
  "conversation_id": "string"
}

Notes:

Synchronous: the client waits for the complete reply.
Conversation state is kept in Redis.
Messages are stored in the database for history.

POST /chat/stream

Executes a single chat turn and returns the AI reply as Server-Sent Events (SSE).

Authentication: JWT via Authorization: Bearer <token>.

Request body:

{
  "message": "string",
  "conversation_id": "string"
}

Behavior:

Accepts the user message and conversation_id.
Loads or creates the conversation in Redis.
Streams the AI reply incrementally via SSE.
On completion (is_final: true), updates the cache, saves messages to the database, and sends a final chunk (with optional metrics).

Response format (SSE):

data: {"content": "chunk of text", "is_final": false, "metrics": null}

data: {"content": "another chunk", "is_final": false, "metrics": null}

data: {"content": "last chunk", "is_final": true, "metrics": {...}}

Response headers:

Content-Type: text/event-stream (or similar SSE type)
Cache-Control: no-cache
Connection: keep-alive

Notes:

Streaming improves perceived latency for long replies.
The final event may include metrics (e.g. token count, timing).
Conversation state and persistence behave like /chat/run.

Chat API

Chat API

POST /chat/run

POST /chat/stream

On this page