Epsilon gate
A convergence threshold (eps) used as a hard pass/fail gate - and the silent failure it causes when the same code runs on different hardware.
An epsilon gate is a pass/fail check built around a number that’s supposed to be almost
zero. Many iterative algorithms stop when some error measure drops below a tiny threshold called
epsilon (often written eps, a value like 0.002): close enough, declare success, hand back
the answer. The trouble starts when the code treats that threshold as a hard gate, returning a
result only if the error makes it under epsilon. Anything that lands a hair short gets thrown
away, even when it’s a perfectly good answer.
That’s fine until the arithmetic shifts underneath it. The same computation run on a different
backend, CUDA
versus MPS
versus the CPU, will not produce bit-identical numbers, because
floating-point precision
and the order of operations differ from one machine to the next. A search that used
gradient descent
to reach 0.0019 on the GPU it was tuned for might settle at 0.0021
somewhere else: the same quality of answer, missing the gate by a rounding error. If epsilon was
hardcoded for one machine, the program can silently produce nothing on another. No crash, no
warning, just an empty output folder and a confusing afternoon.
The fix is to stop treating epsilon as a publication gate and treat it as a stopping hint: keep going until the error is under the threshold or you run out of steps, then always return the best result the run actually found. Epsilon decides when to stop looking, not whether the answer is allowed to exist. A hardcoded convergence threshold is a close cousin of any magic number tuned on one setup, the kind of buried assumption that turns into a silent failure the moment the ground moves: a new GPU, a different floating-point precision , a fresh library version.