CUDA

NVIDIA’s GPU-compute platform; the default ML backend.

CUDA (Compute Unified Device Architecture) is NVIDIA’s proprietary parallel-computing platform and API for running general-purpose work on NVIDIA GPUs. It’s the dominant backend for ML: you write kernels (or, far more often, use libraries like cuDNN and cuBLAS (see cuDNN / cuBLAS) and the CUDA-backed builds of PyTorch/TensorFlow) that fan computation out across the GPU’s thousands of cores. Its lock on the field comes less from the hardware than from the software moat — the mature library ecosystem and tooling that everyone targets first. The catch is it’s NVIDIA-only: it won’t run on Apple, AMD, or Intel GPUs, which is exactly the gap that Metal and Vulkan fill on other hardware. In the local-inference world, GGML ships a CUDA backend so .gguf models can offload layers to an NVIDIA GPU instead of grinding on the CPU.