STIP Docs

API Reference

Complete endpoint documentation for LLM Gateway API

API Reference

Complete documentation for all endpoints in the LLM Gateway API.

Base URL

http://localhost:8000

Authentication

If the SECRET_TOKEN environment variable is set, all endpoints require authentication via the X-API-Key header:

X-API-Key: your_secret_token

Root Endpoint

GET /

Check API status and list available providers.

Response:

{
  "status": "success",
  "api_status": "up",
  "providers": ["OPENAI", "GROQ", "GOOGLE", "LOCAL"]
}

Provider Management

GET /llm/providers

List all supported LLM providers.

Response:

{
  "status": "success",
  "providers": ["openai", "groq", "google", "local"]
}

Inference Endpoints

Common Request Parameters

All inference endpoints share these common parameters:

ParameterTypeRequiredDefaultDescription
modelstringYes-Provider and model in format provider/model_name (e.g., openai/gpt-4o)
promptstringNo""System prompt for the LLM
user_promptstringNonullUser input or question
temperaturefloatNo0.2Sampling temperature (0.0-1.0)
max_tokensintNo4096Maximum tokens in response (1-32768)

POST /llm/generation

Generate text based on a prompt.

Request:

{
  "model": "openai/gpt-4o",
  "prompt": "You are a helpful assistant.",
  "user_prompt": "What is the capital of France?",
  "temperature": 0.7,
  "max_tokens": 150
}

Response:

{
  "status": "success",
  "content": "The capital of France is Paris.",
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

POST /llm/classification

Classify input text into predefined categories.

Additional Parameters:

ParameterTypeRequiredDescription
classesarrayYesList of classification classes with id and label

Request:

{
  "model": "openai/gpt-4o",
  "user_prompt": "This movie was absolutely terrible. Waste of time.",
  "classes": [
    {"id": "positive", "label": "Positive sentiment"},
    {"id": "negative", "label": "Negative sentiment"},
    {"id": "neutral", "label": "Neutral sentiment"}
  ],
  "temperature": 0.2
}

Response:

{
  "status": "success",
  "content": {
    "id": "negative",
    "label": "Negative sentiment"
  },
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 5,
    "total_tokens": 50
  }
}

POST /llm/summarization

Summarize input text into a concise summary.

Request:

{
  "model": "groq/llama-3.3-70b-versatile",
  "user_prompt": "Long article text here...",
  "temperature": 0.3,
  "max_tokens": 200
}

Response:

{
  "status": "success",
  "content": {
    "summary": "Concise summary of the input text."
  },
  "usage": {
    "prompt_tokens": 250,
    "completion_tokens": 45,
    "total_tokens": 295
  }
}

Streaming

POST /llm/generation/stream

Stream text generation in real-time using Server-Sent Events (SSE).

Streaming is only supported for text generation tasks, not for classification or summarization.

Request:

Same as /llm/generation endpoint.

Response:

SSE stream with chunks in this format:

{
  "id": "unique_chunk_id",
  "text": "chunk of generated text",
  "is_final": false,
  "usage": null
}

The final chunk includes usage statistics:

{
  "id": "final_chunk_id",
  "text": "complete generated text",
  "is_final": true,
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Model Format

Models must be specified in the format provider/model_name:

  • openai/gpt-4o
  • openai/gpt-3.5-turbo
  • groq/llama-3.3-70b-versatile
  • groq/mixtral-8x7b-32768
  • google/gemini-pro
  • local/your-model-name

Error Responses

All error responses follow this format:

{
  "status": "error",
  "error": "Error message describing what went wrong",
  "content": null,
  "usage": null
}

Common Error Codes:

  • 400: Invalid request parameters
  • 401: Authentication failed (invalid or missing API key)
  • 500: Internal server error or provider API error

Rate Limits

Rate limits are determined by the underlying LLM provider. Refer to each provider's documentation:

On this page