Metal
Apple’s GPU-compute API; powers MLX and Metal-backed ML.
Metal is Apple’s low-level graphics and GPU-compute API — its answer to CUDA and
Vulkan, but exclusive to Apple hardware (Mac, iPhone, iPad). For ML it matters because
Apple-silicon chips share memory between CPU and GPU (unified memory), so Metal lets models
use the GPU without copying weights across a PCIe bus. The compute side is exposed through
MPS (Metal Performance Shaders) and the higher-level MPS Graph; PyTorch’s mps device
and MLX both sit on top of it. In local inference, GGML ships a Metal backend, which
is how llama.cpp and Ollama accelerate .gguf models on a Mac instead of falling back to
the CPU.