GGML

C/C++ tensor library powering llama.cpp; runs GGUF.

GGML is a C/C++ library, written by Georgi Gerganov, for running machine-learning models (as opposed to training them), with a deliberate focus on getting good performance out of ordinary CPUs and consumer hardware, though it also supports GPU backends like CUDA , Metal , and Vulkan . It’s the engine under llama.cpp and whisper.cpp : it defines the math operations, the order they run in, and (crucially) the quantization schemes that shrink the model’s numbers to fewer bits so large models fit in limited memory. One confusing wrinkle: “GGML” was also the name of an early single-file model format from the same project. That format was retired and replaced by GGUF , but the library lives on and is what actually crunches the numbers when you run a .gguf model. So today: GGML = the engine, GGUF = the file it loads.