LLM Gateway API
Unified API for multi-provider LLM inference
LLM Gateway API
A FastAPI service that provides a unified interface for performing inference across multiple Large Language Model (LLM) providers.
Overview
LLM Gateway API simplifies LLM integration by offering a single, consistent API that supports multiple providers including OpenAI, Google, Groq, and local LLM servers. The service handles authentication, request routing, and response formatting, allowing you to switch between providers without changing your application code.
Key Features
- Multi-Provider Support: OpenAI, Google, Groq, and local LLM servers
- Multiple Task Types: Text generation, classification, and summarization
- Streaming Support: Real-time text generation via Server-Sent Events (SSE)
- Unified Interface: Single API format across all providers
- Token Usage Tracking: Automatic tracking of prompt and completion tokens
- Docker Support: Production-ready containerized deployment
Architecture
The service is built on FastAPI and uses the following components:
- Routers (
/llm/*): Handle HTTP endpoints for inference tasks - Services: Manage LLM provider interactions via pydantic-ai agents
- Core: Factory pattern for creating provider-specific LLM instances
- Models: Pydantic models for request/response validation
- Authentication: Optional API key-based authentication
Supported Tasks
Text Generation
Generate text based on a prompt with full control over temperature and token limits.
Classification
Classify input text into predefined categories with structured output.
Summarization
Generate concise summaries of longer texts with structured output.
Supported Providers
- OpenAI: GPT-4, GPT-3.5, and other OpenAI models
- Google: Gemini and other Google models
- Groq: Fast inference with Llama and other models
- Local: OpenAI-compatible local LLM servers
Quick Links
- Quick Start - Get running in minutes
- API Reference - Complete endpoint documentation
- Configuration - Environment setup and options
- Changelog - Version history and updates