GGML
C/C++ tensor library powering llama.cpp; runs GGUF.
GGML is a C/C++ tensor library written by Georgi Gerganov for machine-learning
inference, with a deliberate bias toward running models efficiently on CPUs and consumer
hardware (though it also supports GPU backends like CUDA, Metal, and Vulkan). It’s the
engine underneath llama.cpp and
whisper.cpp: it defines the tensor
operations, the computation graph, and crucially the integer quantization schemes
that let large models fit in limited RAM/VRAM. Confusingly, “GGML” was also the name of
an early single-file model format from the same project — that format was deprecated
and replaced by GGUF, but the library lives on and is what actually executes the
math when you run a .gguf model. So today: GGML = the inference library, GGUF = the
file format it loads.