[ClassDef2] Use a faster algorithm in subset() Speedup across the board; up to 40% for MPlus1 at small sizes.