Linux + x86_64 optimisations. Add likely() macro. Optimise a few obvious code paths with likely/unlikely. Change algo to sse2_amd64 by default. Move priority change to worker threads only. Detect number of CPUs and set default number of threads == CPUs. Add scheduling policy change to worker threads to SCHED_IDLE first and fallback to SCHED_BATCH on linux. Don't error when failing to set priority. Add CPU affinity and bind worker threads to CPUs when number of threads is a multiple of number of CPUs. Update NEWS with changes.