API Reference
Complete endpoint documentation for LLM Gateway API
API Reference
Complete documentation for all endpoints in the LLM Gateway API.
Base URL
http://localhost:8000Authentication
If the SECRET_TOKEN environment variable is set, all endpoints require authentication via the X-API-Key header:
X-API-Key: your_secret_tokenRoot Endpoint
GET /
Check API status and list available providers.
Response:
{
"status": "success",
"api_status": "up",
"providers": ["OPENAI", "GROQ", "GOOGLE", "LOCAL"]
}Provider Management
GET /llm/providers
List all supported LLM providers.
Response:
{
"status": "success",
"providers": ["openai", "groq", "google", "local"]
}Inference Endpoints
Common Request Parameters
All inference endpoints share these common parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | - | Provider and model in format provider/model_name (e.g., openai/gpt-4o) |
prompt | string | No | "" | System prompt for the LLM |
user_prompt | string | No | null | User input or question |
temperature | float | No | 0.2 | Sampling temperature (0.0-1.0) |
max_tokens | int | No | 4096 | Maximum tokens in response (1-32768) |
POST /llm/generation
Generate text based on a prompt.
Request:
{
"model": "openai/gpt-4o",
"prompt": "You are a helpful assistant.",
"user_prompt": "What is the capital of France?",
"temperature": 0.7,
"max_tokens": 150
}Response:
{
"status": "success",
"content": "The capital of France is Paris.",
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}POST /llm/classification
Classify input text into predefined categories.
Additional Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
classes | array | Yes | List of classification classes with id and label |
Request:
{
"model": "openai/gpt-4o",
"user_prompt": "This movie was absolutely terrible. Waste of time.",
"classes": [
{"id": "positive", "label": "Positive sentiment"},
{"id": "negative", "label": "Negative sentiment"},
{"id": "neutral", "label": "Neutral sentiment"}
],
"temperature": 0.2
}Response:
{
"status": "success",
"content": {
"id": "negative",
"label": "Negative sentiment"
},
"usage": {
"prompt_tokens": 45,
"completion_tokens": 5,
"total_tokens": 50
}
}POST /llm/summarization
Summarize input text into a concise summary.
Request:
{
"model": "groq/llama-3.3-70b-versatile",
"user_prompt": "Long article text here...",
"temperature": 0.3,
"max_tokens": 200
}Response:
{
"status": "success",
"content": {
"summary": "Concise summary of the input text."
},
"usage": {
"prompt_tokens": 250,
"completion_tokens": 45,
"total_tokens": 295
}
}Streaming
POST /llm/generation/stream
Stream text generation in real-time using Server-Sent Events (SSE).
Streaming is only supported for text generation tasks, not for classification or summarization.
Request:
Same as /llm/generation endpoint.
Response:
SSE stream with chunks in this format:
{
"id": "unique_chunk_id",
"text": "chunk of generated text",
"is_final": false,
"usage": null
}The final chunk includes usage statistics:
{
"id": "final_chunk_id",
"text": "complete generated text",
"is_final": true,
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}Model Format
Models must be specified in the format provider/model_name:
openai/gpt-4oopenai/gpt-3.5-turbogroq/llama-3.3-70b-versatilegroq/mixtral-8x7b-32768google/gemini-prolocal/your-model-name
Error Responses
All error responses follow this format:
{
"status": "error",
"error": "Error message describing what went wrong",
"content": null,
"usage": null
}Common Error Codes:
400: Invalid request parameters401: Authentication failed (invalid or missing API key)500: Internal server error or provider API error
Rate Limits
Rate limits are determined by the underlying LLM provider. Refer to each provider's documentation: