LLMetrics: Benchmarking LLM Inference Services

LLMetrics is a comprehensive benchmarking tool designed to evaluate and compare the performance of Large Language Model (LLM) inference APIs across various providers. It measures key metrics such as Time-to-First-Token (TTFT), Time-Between-Tokens (TBT), and overall End-to-End (E2E) latency in a standardized testing environment.

Features

Standardized Testing: Uses fixed prompts, input tokens, and output tokens for consistent performance evaluation
Provider Comparison: Benchmarks multiple LLM service providers, including cloud APIs (e.g., OpenAI, Anthropic, Cloudflare) and local servers (e.g., vLLM)
Configurable Experiments: Define experiments via a JSON configuration file that specifies providers, models, number of requests, token sizes, and streaming mode
Data Persistence: Stores benchmarking results in DynamoDB for scalable, historical tracking
Visualization: Generates latency and CDF plots which are integrated into an interactive dashboard for actionable insights
Automated Workflows: Scheduled experiments (e.g., weekly via AWS VM and GitHub Actions) ensure continuous performance monitoring

Setup

1. Clone the Repository

# Using HTTPS
git clone https://github.com/your-username/LLMetrics.git

# Using SSH
git clone [email protected]:your-username/LLMetrics.git

cd LLMetrics

2. Install Dependencies

pip install \-r requirements.txt

3. Configure Environment Variables

Create a .env file in the repository root with your API keys and credentials:

AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
CLOUDFLARE_ACCOUNT_ID="your-cloudflare-account-id"
CLOUDFLARE_AI_TOKEN="your-cloudflare-ai-token"
TOGETHER_AI_API="your-together-ai-api-key"
OPEN_AI_API="your-openai-api-key"
ANTHROPIC_API="your-anthropic-api-key"
PERPLEXITY_AI_API="your-perplexity-ai-api-key"
HYPERBOLIC_API="your-hyperbolic-api-key"
GROQ_API_KEY="your-groq-api-key"
GEMINI_API_KEY="your-gemini-api-key"
AZURE_LLAMA_8B_API="your-azure-llama-8b-api-key"
AZURE_LLAMA_3.1_70B_API="your-azure-llama-70b-api-key"
MISTRAL_LARGE_API="your-mistral-large-api-key"
AWS_BEDROCK_ACCESS_KEY_ID="your-aws-bedrock-access-key-id"
AWS_BEDROCK_SECRET_ACCESS_KEY="your-aws-bedrock-secret-key"
AWS_BEDROCK_REGION="your-aws-bedrock-region"
DYNAMODB_ENDPOINT_URL="your-dynamodb-endpoint-url"

Usage

1. Create a Configuration File

Create a config.json file to define your benchmarking experiment:

{ "providers": 
    [ "TogetherAI", "Cloudflare", "OpenAI", "Anthropic", "vLLM" \], 
  "models": 
    [ "common-model" ], 
  "num_requests": 100, 
  "input_tokens": 10, 
  "streaming": true, 
  "max_output": 100, 
  "verbose": true
}

2. Run the Benchmark

python main.py -c config.json`
OR
python main.py -c config.json --vllm_ip <host ip>

3. View Results

LLMetrics saves plots (latency graphs - CDF Plots) in the designated output directory benchmark_graph

Continuous Benchmarking Workflow

LLMetrics integrates with a CI/CD pipeline to run weekly experiments on an AWS VM:

GitHub Actions: Automates weekly benchmarking runs and CI tests
AWS VM: Provides a stable network environment for consistent benchmarking
DynamoDB: Stores benchmark data securely for historical analysis
Dashboard: An interactive dashboard deployed on GitHub Pages visualizes the results

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github		.github
benchmarking		benchmarking
dashboardUI		dashboardUI
dynamodb		dynamodb
notebooks		notebooks
providers		providers
server		server
test_files		test_files
utils		utils
.DS_Store		.DS_Store
.coveragerc		.coveragerc
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMetrics: Benchmarking LLM Inference Services

Features

Setup

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

Usage

1. Create a Configuration File

2. Run the Benchmark

3. View Results

Continuous Benchmarking Workflow

About

Releases

Packages

Contributors 3

Languages

vhive-serverless/LLM-Benchmarking

Folders and files

Latest commit

History

Repository files navigation

LLMetrics: Benchmarking LLM Inference Services

Features

Setup

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

Usage

1. Create a Configuration File

2. Run the Benchmark

3. View Results

Continuous Benchmarking Workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages