Back to Glossary
Benchmark
ベンチマーク(ベンチマーク)
IntermediateCore Concepts
A standardized test or dataset used to measure and compare the performance of different AI models on specific tasks.
Why It Matters
Benchmarks help you choose the right model by comparing accuracy, speed, and capability across models.
Example in Practice
MMLU (Massive Multitask Language Understanding) testing how well models answer questions across 57 subjects.