Use atomic ops to never miss a nonce on opencl kernels, including nonce==0, also allowing us to make the output buffer smaller.