Add one more instruction to avoid one branch point in the common path in the cl return code. Although this adds more ALUs overall and more branch points, the common path code has the same number of ALUs and one less jmp, jmps being more expensive.