An interactive dashboard for exploring and visualizing merged data from LLM leaderboards, built with Gradio. Check it out our deployed HuggingFace Space: Link
This application provides an interactive interface to view, filter, and compare Large Language Models (LLMs) based on aggregated data from prominent leaderboard sources:
- LiveBench: Features performance metrics like Global Average, Reasoning, Coding, Mathematics, Data Analysis, Language, and Instruction Following scores.
- LMSYS Chatbot Arena: Includes community-based Elo ratings (Arena Score), rankings, and voting data.
The dashboard allows users to easily navigate and compare models across various metrics and categories.
- Interactive Data Tables: View LLM data organized into tabs:
- Performance Metrics: Core benchmark scores from LiveBench.
- Model Details: Information like Organization, License, Knowledge Cutoff, and links.
- Community Stats: Data from the Chatbot Arena Leaderboard (Ranks, Score, Votes).
- Model Mapping: Shows the unified model name alongside original names from LiveBench and Arena.
- Filtering: Dynamically filter the displayed models by:
- Search term (searches Model Name and Organization).
- Organization.
- Minimum Global Average score.
- Detailed Model Card: Click on any row in the data tables to view a comprehensive card summarizing all metrics for that specific model.
- Visualizations Tab:
- Bar Chart: Compare the top 15 models based on a user-selected metric (e.g., Global Average, Arena Score, Coding Average).
- Radar Chart: Select multiple models (up to 5) to compare their performance profile across key metrics (Reasoning, Coding, Math, Data Analysis, Language, IF Average, and scaled Arena Score).
The application uses a pre-merged CSV file (data/merged_leaderboards.csv
) containing data aggregated from the sources mentioned above.
- Python 3.9+
- pip (Python package installer)
-
Clone the repository (Optional):
# If you have the code in a git repository git clone <your-repo-url> cd <your-repo-directory>
If you just have the files, navigate to the project directory in your terminal.
-
Install Dependencies: Create a
requirements.txt
file with the following content:gradio==4.9.0 pandas plotly numpy
Then, install the requirements:
pip install -r requirements.txt
To run the application locally:
python app.py
The application will typically be available at http://127.0.0.1:7860 in your web browser.
GTLLMZoo2
├─ app.py # Main Gradio application entry point
├─ requirements.txt # Python dependencies
├─ data
│ └─ merged_leaderboards.csv # Merged leaderboard data
└─ src
├─ data_processing.py # Data loading and filtering logic
└─ ui.py # Gradio UI definition and logic
Contributions are welcome! Please feel free to submit a Pull Request if you have improvements or bug fixes.
MIT License