Make poclbm use 4 vectors and decrease worksize to keep pipelines fullish. Make it possible to have 0 CPU threads and update docs. Fix counter with no cpu threads.