MLX

Apple-silicon ML framework; the Mac answer to GGUF.

MLX is Apple’s open-source array framework for machine learning, built specifically for Apple silicon (M-series chips). Its headline feature is unified memory: the CPU and GPU share the same memory pool, so arrays don’t have to be copied back and forth between “host” and “device” the way they do with CUDA. Under the hood it builds on Apple’s Metal GPU stack. The API is deliberately NumPy-like (with PyTorch-style neural-net modules), and it uses lazy evaluation — computations are only materialized when you actually need the result. It supports automatic differentiation and on-device quantization. For local LLM work on a Mac, the companion library mlx-lm loads and runs models in MLX format, making it the native Apple-silicon alternative to the GGUF / llama.cpp ecosystem (often noticeably faster on Mac, but Mac-only).