GGML

C/C++ tensor library powering llama.cpp; runs GGUF.

GGML is a C/C++ tensor library written by Georgi Gerganov for machine-learning inference, with a deliberate bias toward running models efficiently on CPUs and consumer hardware (though it also supports GPU backends like CUDA, Metal, and Vulkan). It’s the engine underneath llama.cpp and whisper.cpp: it defines the tensor operations, the computation graph, and crucially the integer quantization schemes that let large models fit in limited RAM/VRAM. Confusingly, “GGML” was also the name of an early single-file model format from the same project — that format was deprecated and replaced by GGUF, but the library lives on and is what actually executes the math when you run a .gguf model. So today: GGML = the inference library, GGUF = the file format it loads.