LLM Gateway API

A FastAPI service that provides a unified interface for performing inference across multiple Large Language Model (LLM) providers.

Overview

LLM Gateway API simplifies LLM integration by offering a single, consistent API that supports multiple providers including OpenAI, Google, Groq, and local LLM servers. The service handles authentication, request routing, and response formatting, allowing you to switch between providers without changing your application code.

Key Features

Multi-Provider Support: OpenAI, Google, Groq, and local LLM servers
Multiple Task Types: Text generation, classification, and summarization
Streaming Support: Real-time text generation via Server-Sent Events (SSE)
Unified Interface: Single API format across all providers
Token Usage Tracking: Automatic tracking of prompt and completion tokens
Docker Support: Production-ready containerized deployment

Architecture

The service is built on FastAPI and uses the following components:

Routers (/llm/*): Handle HTTP endpoints for inference tasks
Services: Manage LLM provider interactions via pydantic-ai agents
Core: Factory pattern for creating provider-specific LLM instances
Models: Pydantic models for request/response validation
Authentication: Optional API key-based authentication

Supported Tasks

Text Generation

Generate text based on a prompt with full control over temperature and token limits.

Classification

Classify input text into predefined categories with structured output.

Summarization

Generate concise summaries of longer texts with structured output.

Supported Providers

OpenAI: GPT-4, GPT-3.5, and other OpenAI models
Google: Gemini and other Google models
Groq: Fast inference with Llama and other models
Local: OpenAI-compatible local LLM servers

Quick Links

Quick Start - Get running in minutes
API Reference - Complete endpoint documentation
Configuration - Environment setup and options
Changelog - Version history and updates

LLM Gateway API

On this page