API Reference

Complete documentation for all endpoints in the LLM Gateway API.

Base URL

http://localhost:8000

Authentication

If the SECRET_TOKEN environment variable is set, all endpoints require authentication via the X-API-Key header:

X-API-Key: your_secret_token

Root Endpoint

GET `/`

Check API status and list available providers.

Response:

{
  "status": "success",
  "api_status": "up",
  "providers": ["OPENAI", "GROQ", "GOOGLE", "LOCAL"]
}

Provider Management

GET `/llm/providers`

List all supported LLM providers.

Response:

{
  "status": "success",
  "providers": ["openai", "groq", "google", "local"]
}

Inference Endpoints

Common Request Parameters

All inference endpoints share these common parameters:

Parameter	Type	Required	Default	Description
`model`	string	Yes	-	Provider and model in format `provider/model_name` (e.g., `openai/gpt-4o`)
`prompt`	string	No	""	System prompt for the LLM
`user_prompt`	string	No	null	User input or question
`temperature`	float	No	0.2	Sampling temperature (0.0-1.0)
`max_tokens`	int	No	4096	Maximum tokens in response (1-32768)

POST `/llm/generation`

Generate text based on a prompt.

Request:

{
  "model": "openai/gpt-4o",
  "prompt": "You are a helpful assistant.",
  "user_prompt": "What is the capital of France?",
  "temperature": 0.7,
  "max_tokens": 150
}

Response:

{
  "status": "success",
  "content": "The capital of France is Paris.",
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

POST `/llm/classification`

Classify input text into predefined categories.

Additional Parameters:

Parameter	Type	Required	Description
`classes`	array	Yes	List of classification classes with `id` and `label`

Request:

{
  "model": "openai/gpt-4o",
  "user_prompt": "This movie was absolutely terrible. Waste of time.",
  "classes": [
    {"id": "positive", "label": "Positive sentiment"},
    {"id": "negative", "label": "Negative sentiment"},
    {"id": "neutral", "label": "Neutral sentiment"}
  ],
  "temperature": 0.2
}

Response:

{
  "status": "success",
  "content": {
    "id": "negative",
    "label": "Negative sentiment"
  },
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 5,
    "total_tokens": 50
  }
}

POST `/llm/summarization`

Summarize input text into a concise summary.

Request:

{
  "model": "groq/llama-3.3-70b-versatile",
  "user_prompt": "Long article text here...",
  "temperature": 0.3,
  "max_tokens": 200
}

Response:

{
  "status": "success",
  "content": {
    "summary": "Concise summary of the input text."
  },
  "usage": {
    "prompt_tokens": 250,
    "completion_tokens": 45,
    "total_tokens": 295
  }
}

Streaming

POST `/llm/generation/stream`

Stream text generation in real-time using Server-Sent Events (SSE).

Streaming is only supported for text generation tasks, not for classification or summarization.

Request:

Same as /llm/generation endpoint.

Response:

SSE stream with chunks in this format:

{
  "id": "unique_chunk_id",
  "text": "chunk of generated text",
  "is_final": false,
  "usage": null
}

The final chunk includes usage statistics:

{
  "id": "final_chunk_id",
  "text": "complete generated text",
  "is_final": true,
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Model Format

Models must be specified in the format provider/model_name:

openai/gpt-4o
openai/gpt-3.5-turbo
groq/llama-3.3-70b-versatile
groq/mixtral-8x7b-32768
google/gemini-pro
local/your-model-name

Error Responses

All error responses follow this format:

{
  "status": "error",
  "error": "Error message describing what went wrong",
  "content": null,
  "usage": null
}

Common Error Codes:

400: Invalid request parameters
401: Authentication failed (invalid or missing API key)
500: Internal server error or provider API error

Rate Limits

Rate limits are determined by the underlying LLM provider. Refer to each provider's documentation:

API Reference

On this page