ai-gateway/implementation_plan.md

# Storyline AI Gateway - Implementation Plan

## Project Overview
A FastAPI-based gateway for e-learning modules (Articulate Storyline) to access LLM services (Gemini, OpenAI) with centralized authentication and rate limiting.

## Tech Stack
- **Framework**: FastAPI
- **Server**: Uvicorn
- **Rate Limiting**: Slowapi (limiter)
- **Auth**: Header-based API Key (`X-API-Key`)
- **LLMs**: Google Gemini, OpenAI

## Directory Structure
- `app/main.py`: Application entry point and middleware configuration.
- `app/core/`: Configuration and utilities (limiter, settings).
- `app/api/deps.py`: Shared dependencies (authentication).
- `app/api/router.py`: API versioning and route aggregation.
- `app/api/endpoints/`:
    - `storyline.py`: Generic endpoint.
    - `gemini.py`: Dedicated Gemini endpoint.
    - `openai.py`: Dedicated OpenAI endpoint.

## Configuration
Managed via `.env` file:
- `API_KEY`: Secret key for Storyline modules.
- `GOOGLE_API_KEY`: API key for Google Generative AI.
- `OPENAI_API_KEY`: API key for OpenAI.
- `PORT`: Server port (default 8000).

## API Endpoints
All endpoints are versioned under `/api/v1`.

### 1. Gemini Chat
- **URL**: `/api/v1/gemini/chat`
- **Method**: POST
- **Headers**: `X-API-Key: <your_key>`
- **Body**: `{"prompt": "string", "context": "string"}`

### 2. OpenAI Chat
- **URL**: `/api/v1/openai/chat`
- **Method**: POST
- **Headers**: `X-API-Key: <your_key>`
- **Body**: `{"prompt": "string", "context": "string"}`

## Rate Limiting
- Applied globally/per endpoint: **20 calls per minute**.

## Future Steps
- Add logging (WandB or file-based).
- Implement response caching.
- Add more LLM providers (Anthropic, etc.).