
⥠Loading model data...

Compare price, performance, and speed across the entire AI ecosystem.
Updated daily with the latest benchmarks.
Four ways into WhatLLM. Pick based on what you're trying to do.
Browse 90+ models with filters. Sort by quality, price, speed, or context window.
Pick 2â4 models, compare benchmarks side by side. See where each one wins.
LLM Selector & Provider Finder. Quick questions, ranked shortlist.
Rankings, deep-dives, and market analysis. Updated with every release.
Top recommendations based on what matters most to you
Quality meets affordability
Speed when it matters
Top benchmark performance
Ranked by Quality Index across all benchmarks
Jump straight to models optimized for your specific needs
The AI landscape moves fast. Every few weeks a new model launches â sometimes multiple in a single day â each claiming state-of-the-art results on different benchmarks. For developers choosing an LLM for production, for researchers evaluating the field, or for teams deciding where to invest their API budget, keeping track of it all is exhausting.
WhatLLM.org was built to solve that problem. We aggregate benchmark data, real-world pricing, and throughput metrics for over 263 large language models from 46+ providers into one place. Instead of opening dozens of tabs to compare OpenAI, Anthropic, Google, Meta, DeepSeek, and Mistral side by side, you get a unified interface where models can be filtered, sorted, and compared on the dimensions that actually matter to your use case.
Every model on WhatLLM.org is evaluated across four core dimensions: quality, speed, price, and context length. Quality is measured using the Artificial Analysis Intelligence Index, a composite benchmark score that synthesizes results from GPQA Diamond (PhD-level reasoning), AIME 2025 (advanced mathematics), LiveCodeBench (real-world coding), SWE-Bench Verified (software engineering), MMLU-Pro (broad knowledge), and Humanity's Last Exam (frontier reasoning) into a single 0â100 score.
Speed is measured in output tokens per second, reflecting real-world throughput under typical load. Price is tracked per million tokens for both input and output, with blended cost calculations. Context length reflects the maximum number of tokens a model can process in a single request, ranging from 8K tokens on older models to 10M tokens on the latest architectures.
There is no single "best" LLM. The right choice depends on your priorities. If you need the highest reasoning quality for complex tasks, frontier models like GPT-5, Gemini 3 Pro, or Claude Opus 4.5 lead the benchmarks. If cost efficiency matters most, open-source models like DeepSeek V3, Qwen3, or Kimi K2 deliver strong performance at a fraction of the price. For latency-sensitive applications, speed-optimized endpoints on providers like Groq, Fireworks, or Cerebras can deliver hundreds of tokens per second.
Our LLM Selector tool walks you through a few quick questions about your use case â coding, analysis, creative writing, agentic workflows â and recommends a shortlist of models ranked by fit. The Compare page lets you pick any 2â4 models and see their benchmarks, pricing, and speed side by side in a detailed breakdown.
Beyond the comparison tools, WhatLLM.org publishes original analysis on model releases, benchmark trends, and the economics of AI deployment. Our blog covers topics from detailed model face-offs (like Kimi K2 Thinking vs. ChatGPT 5.1) to broader industry analysis (the open-source vs. proprietary cost curve, the rise of agentic coding models, and whether benchmark saturation is making traditional evaluation frameworks obsolete). Each article is written with original commentary grounded in the data we track daily.
Benchmark and quality data is sourced from Artificial Analysis, an independent research organization. Pricing data is verified against official provider documentation and updated daily. We are transparent about what we measure, what we aggregate, and what we add on top â our full methodology is documented publicly. WhatLLM.org does not run its own benchmarks; we focus on making existing high-quality data accessible, interactive, and actionable.