STIP Docs

LLM Gateway API

Unified API for multi-provider LLM inference

LLM Gateway API

A FastAPI service that provides a unified interface for performing inference across multiple Large Language Model (LLM) providers.

Overview

LLM Gateway API simplifies LLM integration by offering a single, consistent API that supports multiple providers including OpenAI, Google, Groq, and local LLM servers. The service handles authentication, request routing, and response formatting, allowing you to switch between providers without changing your application code.

Key Features

  • Multi-Provider Support: OpenAI, Google, Groq, and local LLM servers
  • Multiple Task Types: Text generation, classification, and summarization
  • Streaming Support: Real-time text generation via Server-Sent Events (SSE)
  • Unified Interface: Single API format across all providers
  • Token Usage Tracking: Automatic tracking of prompt and completion tokens
  • Docker Support: Production-ready containerized deployment

Architecture

The service is built on FastAPI and uses the following components:

  • Routers (/llm/*): Handle HTTP endpoints for inference tasks
  • Services: Manage LLM provider interactions via pydantic-ai agents
  • Core: Factory pattern for creating provider-specific LLM instances
  • Models: Pydantic models for request/response validation
  • Authentication: Optional API key-based authentication

Supported Tasks

Text Generation

Generate text based on a prompt with full control over temperature and token limits.

Classification

Classify input text into predefined categories with structured output.

Summarization

Generate concise summaries of longer texts with structured output.

Supported Providers

  • OpenAI: GPT-4, GPT-3.5, and other OpenAI models
  • Google: Gemini and other Google models
  • Groq: Fast inference with Llama and other models
  • Local: OpenAI-compatible local LLM servers

On this page