Back to Glossary
Quantization
量子化(りょうしか)
AdvancedModels & Architecture
A technique that reduces AI model size and speeds up inference by using lower-precision numbers, with minimal quality loss.
Why It Matters
Quantization makes it possible to run large language models on phones, laptops, and other consumer hardware.
Example in Practice
Running a 4-bit quantized Llama model on a MacBook instead of needing a $10,000 GPU server.